Unusual gene structures

This is a short list of some of my favourite unusual gene structures in C. elegans. Some of these still pose problems for WormBase curators trying to model their structure and final protein products accurately.

Please add your own favourites!

Exons on opposite strands

eri-6 and eri-7 produce separate pre-messenger RNAs (pre-mRNAs)are trans-spliced to form a functional mRNA, eri-6/7

Two genes one locus

The cha-1 (ZC416.8b) and unc-17 (ZC416.8a) transcripts appear to be derived by alternative splicing of a common precursor. The two genes use a common 5’ untranslated exon; the remainder of the unc-17 gene is nested within the long first intron of cha-1.

ubl-1 - Polyubiquitin

Ubiquitin is a 76-amino-acid protein that has been highly conserved in the evolution of eucaryotes.
Ubiquitin is transcribed as a polycistronic mRNA from this locus, translated into a polyprotein, and then cleaved proteolytically to yield free ubiquitin monomer.

unc-49 encodes Multiple Subunits of a Heteromultimeric GABA Receptor

unc-49 contains a single copy of a GABA receptor N terminus, followed by three tandem copies of a GABA receptor C terminus. Using a single promoter, unc-49 generates three distinct GABA-A receptor-like subunits by splicing the N terminus to each of the three C-terminal repeats.

Non-canonical splice sites

There are many cases of non-canonical splice sites in c. elegans genes.
This is just a well-characterised example.
The donor splice site of intron 7 of F01G12.5b starts with GC instead of GT.

A review of non-canonical splice sites can be found in:

The biggest gene - titin


There is nothing particularly extraordinary about the gene ttn-1, except that it is large.

RNA editing

Examples of RNA editing in C. elegans are hard to come by apart from the targets of RNA editing by adr-1 and adr-2 in nervous tissue (and possibly vulval tissue). The known targets of adr-1 and adr-2 are: ZC239.6, Y6D11A.1, F56A8.7a, W03D8.2 and C54D1.5

There also appears to be C-to-U RNA editing in gld-2

Recoding frameshifts


There is a good example of a programmed frameshift in most metazoons: antizyme or ornithine carboxylase: odc-1



There is one selenoprotein in C.elegans: trxr-1

Two genes one locus

lev-10 eat-18

The gene F02C12.1 has a base at the end of exon 8 that is skipped in the EST and RNASeq transcripts.

There is no evidence for a genomic sequencing error at this position - the genomic sequence quality is good in this region.

It is possible that this is an example of a programmed frameshift, but it will take further work to confirm this.

Two genes, one locus:

wars-2 and prx-10

Y65B4BL.1a and Y65B4BL.1b

Another pair of unrelated proteins coded for in non-overlapping isoforms from the same locus:
Y37E11AM.3a and Y37E11AM.3b

Another pair of unrelated proteins coded for in non-overlapping isoforms from the same locus:
T22B11.4a/b and T22B11.3
are found together with splicing between them in C. briggsae, C. remanei and C. japonica, but not C. brenneri.