cdh-4 appears to be a well studied C. elegans gene involved in cell migration and fate. The problem is that this loci appears to suffer from 2
issues in what is the current 13th intron (WS246).
While there is very good support for all of the exons/introns of the current annotation intron 13 appears to be completely unsupported, however there is a smaller intron feature that IS supported by RNASeq spliced read data and very strong splice sites.
The only problem is that the inclusion of this intron, not only throws the 3’ end of this gene/transcript out but also a STOP codon is encountered. There is evidence for a genome sequence error in exon 12 but this alone does not correct the reference genome to accurately reflect what the RNASeq data is telling us.
Usually we as curators would have no hesitation in re-classifying this as a Pseudogene in the reference genome, but there are a number of publications with good phenotype data it makes this re-classification problematic.
There could be a number of things going on here:
- There are lab isolates where this gene is functional and the N2 reference contained a non-functional copy.
- The reference genome contains multiple errors in this exon 12/intron 13 region.
Reading the literature there have been attempts to elucidate the correct sequence of this gene, but both the old WormBase annotation and the newer WormBase annotations have been used to bridge partial RT-PCR data together so the problem region has been overlooked.
What is needed is for someone working on cdh-4 to do proper genomic and RT-PCR on this gene to confirm the genome and the transcript in their N2 stock.