K07F5.15: ancient, conserved, and not a pseudogene!

This is a cautionary tale about being too hasty to convert a CDS to a Pseudogene…

K07F5.15 has a problem with its gene structure.

It is curated as having three coding exons, coding for a 73 aa protein, but there appears to be a clear frameshift at the end of the third exon and then there follows the rest of the third exon and a fourth exon, both of which have very good coding hexamer potentials.

There is extensive EST coverage showing that there is no sequencing error in the region of the frameshift. (No sequencing error anywhere on this gene, in fact.)

It was so clearly a frameshift that I nearly converted it into a pseudogene, but because there were no duplicates of this gene in elegans, I took a look at the paralogs in some other nematode species.
They all share the same frameshift.

Looking at BLASTP hits, there are a lot of very similar proteins in the database, that are all about 74 aa (including human transmembrane protein 167B, MN_020141.3.)

Looking at TBLASTN hits against genomes, there are a lot of species, including human, which have their CDS structure terminated at this apparent ‘frameshift’.

I thought then that there might be some mRNA editing going on which was well conserved, but there is no homology between the possible translation of the region after the ‘frameshift’ in elegans and the human mRNA MN_020141.3

The gene has been left as a coding gene. :slight_smile:

Gary