Cdh-4 (WBGene00000396) Reference issues

PaulD · November 17, 2014, 5:19pm

cdh-4 appears to be a well studied C. elegans gene involved in cell migration and fate. The problem is that this loci appears to suffer from 2
issues in what is the current 13th intron (WS246).
While there is very good support for all of the exons/introns of the current annotation intron 13 appears to be completely unsupported, however there is a smaller intron feature that IS supported by RNASeq spliced read data and very strong splice sites.
The only problem is that the inclusion of this intron, not only throws the 3’ end of this gene/transcript out but also a STOP codon is encountered. There is evidence for a genome sequence error in exon 12 but this alone does not correct the reference genome to accurately reflect what the RNASeq data is telling us.

Usually we as curators would have no hesitation in re-classifying this as a Pseudogene in the reference genome, but there are a number of publications with good phenotype data it makes this re-classification problematic.

There could be a number of things going on here:

There are lab isolates where this gene is functional and the N2 reference contained a non-functional copy.
The reference genome contains multiple errors in this exon 12/intron 13 region.

Reading the literature there have been attempts to elucidate the correct sequence of this gene, but both the old WormBase annotation and the newer WormBase annotations have been used to bridge partial RT-PCR data together so the problem region has been overlooked.

What is needed is for someone working on cdh-4 to do proper genomic and RT-PCR on this gene to confirm the genome and the transcript in their N2 stock.

bdackley · November 18, 2014, 6:44pm

We have done work on cdh-4, and have some partial cDNA and genomic clones. So the gene region is actually transcribed, and using the transgenome fosmid clone, also translated.
To be clear, the issue you’re discussing is in the region where the genome predicts an alternate start site.
As I see in the RNAseq data, the upstream exon appears to include the alternate start exon, and then splices to the next exon, but that places the transcript out of frame (see attached file).
I would be happy to examine the genomic and cDNAs made from this region in our strain. If I am not correct on where the issue is please let me know.

PaulD · June 11, 2015, 10:21am

Dear @bdackley,

For some reason I never got a notification of a reply

The region you have identified is correct and it is the status of the intron in the F25F2.2a that bridges over the alternative start exon that I am interested in getting confirmatory sequence data for as you can’t see it in the spliced RNASeq reads.

Paul

PaulD · June 9, 2016, 11:05am

Additional information: I have now (WS255) updated the annotation to remove the false second Isoform and
corrected the annotation to contain 2 single bp artificial micro-introns.

This generates the correct amino acid sequence and collapses the second Isoform into the primary transcript.

Image shows the problem region and changes.