Routinely we try and identify genomic sequencing errors in the current elegans assembly.
A problem region was identified on F02C12. Where there appears to be a deletion in the genomic in the region of exon 8 of F02C12.1 when compared to available transcript data.
EST agtaaaAttgcaaagg |||||||||||||||| GEN agtaaa-ttgcaaagg
A single EST read (yk1227a01.5) suggests that there was an error in the genome and upon inspecting the result of this change, the gene model would also benefit from the change (extension of exon 8 to utilise strong splicing confirmed by EST data).
Upon inspecting the original clone traces, no error could be identifed, so the question has arrisen, Has the mRNA/EST been modified post transcriptionally to produce a full, coding mRNA?
Might make a good project?
Don’t forget, during all cloning steps there is a possibility of enzymes, in particular Reverse Transcriptase in case of cDNAs,
to make mistakes. So, either the genomic, or more likely, the EST sequence could have an error.
I have seen that once in the case of ceh-21, where a cDNA had an indel with respect to the genomic.
For this gene, several ESTs are available, so one needs just to sequence the central unsequenced portion to find out what the most likely
correct sequence is.
Thank you for the reply Thomas,
That would normally be my first thought also, but in this case the current CDS utilises weak incorrect splicing of exon 7 to maintain frame and allow the 3’ end of the prediction to be modeled. It is unfortunate that there is only a single EST over this region as we currently don’t have the resources or funding to do any sequencing. For this reason I added the post in the hope that it might make a nice little project for a student.
I’ve actually identified a few more of these, some of which have multiple ESTs.
F49C5 F49C5.11 - deletion in genomic caaaaagttgtttgtcaagctcccggagXcacgttgtgcgagagtgttacattagtggt 2 ESTs
- Genomic looks ok…is this another example of post transcriptional modification?
Y57G11C Y57G11C.12 - possible frameshift in 5’ UTR would allow 2 new exons
- Genomic looks ok…is this another example of post transcriptional modification? Annotated 2 seperate isoforms.
- Highly conserved in briggase, exact same configuration.
- The second exon of F58G11.2 has taatcatAttccaatagc (one A) as shown by 5 good quality traces.
- The ESTs (yk1258g04.5, yk380d8.5, yk16e2.5) and OST (OSTF199D6_1) have taatcatAAttccaatagc (two A’s)
Another candidate for Post transcriptional modification is lys-6 (C54C8.6) the EST data and genomic sequence agree but the 2nd ORF appears to suffer a frameshift.
The new annotation WS226 contains a false 2bp intron to correct the frame and utilise the strong splicing available.