C34E10.10 has an unusual 5' end

C34E10.10 is a well-conserved gene. It is an ortholog of human UTP23 (UTP23, small subunit processome component); involved in embryo development.

In nematodes, it appears to have a conserved coding exon at the 5’ end of the transcribed region that cannot be incorporated into the coding structure because one or more premature STOP codons and/or a frameshift would be included.

The translated sequence of this “exon” is:

`remanei_CRE29368 MPKSQQ–RLLNYATSIAKCPTETTVYGSCVSAQAERIKQDDCNAEFKKLIDCVTKNLKK
brenneri_CBN15338 MPKAQQQ-RLLNYATSIAKCPTETSAYGSCVSAQAERIKQGDCNAEFKKLIDCVTKNLKK
elegans_C34E10.10 MPKTQQ–RLLNYATSIAKCPTETSNYGSCVSVQAERIKQGDCSAEFRKLIDCVTKNLKK
briggsae_CBG21174 MPNASNPQRLLSYATSIAKCPTETVAYGSCVSSQAERIKQGDCSAEFKKLIDCVTKNLKK
::.: .******* ****** ..:******

remanei_CRE29368 K—KMKVK
brenneri_CBN15338 K—KMKVK
elegans_C34E10.10 K
IQ
TMKVK
briggsae_CBG21174 K—KNEGE
* . : :
`

Various nematode species (remanei, brenneri) have got structures for the orthologs of this gene where the curator has been creative in joining the 5’ coding region to the rest of the gene structure using an imaginary splice-site!

There is nothing in the literature that would indicate there is a “read-through” STOP codon or programmed ribosomal slippage occurring in this gene.

There are mass-spectrometry peptides found in C. elegans indicating that this exon is coding.

The question that I would like to know the answer to is: is this a part of the coding sequence of these gene and what mechanism joins it to the rest of the coding sequence?