Just 5' of C42C1.10

I’ve just added an isoform to C42C1.10 by inserting a new exon just 5’ of the old gene structure and splicing into the start of C42C1.10. This isoform then ends after translating a short part of the old C42C1.10 first exon in a different frame.

The structure of this is rather speculative and I would appreciate any lab work that could confirm it.

There is evidence from tiling array expression data and from WABA coding potential for this new first exon. There is evidence from the Hillier high-throughput transcriptome data of expression in this region and evidence for a transcription start site at the beginning of the new first exon.

There is no evidence of splicing from this exon to the start of the old C42C1.10 structure, so it is possible that this should not be an isoform, but a separate gene on this operon, but with unusually close spacing between it and the gene C42C1.10.

If it is a separate gene, then the coding region would cover the whole of this ORF, however the WABA coding potential only indicates that only the part of the ORF as curated in this first exon is coding. Also, if it is a separate gene then it will have a coding region overlapping in a different frame with C42C1.10 which is very unusual.

This new first exon appears to be composed of a copy of the 5’ end of F47B8.10, so this may be an expressed pseudogenic fragment, but there is a mass-spec peptide at this position indicating that this is translated.

I am not happy with the current structure because the second exon of this isoform is in a different frame to the one used by the old C42C1.10 first exon and because the new isoform appears to be a duplicate of the 5’ end of another gene, bu the evidence for expression in this region is compelling and there is plausible evidence for translation.

This new isoform, C42C1.10b, will be available in release WS201.



The other Caenorhabditis species seem to each have a BLASTX hit in this region. Could the gene structures of these proteins shed any light on what is happening here?

I see the rationale for making the isoform the way it is, but I don’t like the splicing that makes it overlap with the pre-existing isoform but in a different frame. Will the 2nd exon of the new isoform overlap the ESTs? I.e. is this a gene which will end up being flagged ‘partially confirmed’?

And I guess that the other thing which is always going to be interesting to look at now, is to see whether the sequences of other strains reveal anything interesting in this region.




It is reassuring that the other Caenorhabditis species have this coding region.
So it is now shown to not be a recent pseudogenic fragment, but a conserved and selected-for feature.

The remanei, brenneri and japonica homologs of this region all appear to be isoforms that splice to the second exon, but out of frame compared to the existing, main gene structure.

I think that I was right to make it an isoform for now, but I would not be surprised if this is not the correct gene structure and that some unusual splicing was occurring (maybe to get the frame back on track, for example).