The 454 transcriptome data has been useful recently in fixing various aspects of our gene models, but it sometimes thows up some surprising things.
Today’s surprise is a couple of 454 transcripts (MM454_FPK17YK01B08N2, MM454_FPK17YK01BSONU) which are spliced and the splice sites show that they align to the opposite sense to the CDS ZK822.1.
So there are two 454 transcripts and both align across regions that are on the other sense to exons of a well-characterised gene.
The second region of alignment, after the “intron” starts with a SL1 site which is defined by a TEC-RED tag.
Taking a look at the Illumina RNAseq data from Hillier et al. on the UCSC browser we see a very faint hint of possible expression in the aligned region which overlaps with the last exon of ZK822.1 - this is not a very convincing bit of expression, however looking at the region where the SL1 site is, there is a strong region of expression starting with the characteristice asymmetric RNASeq signal for a transcription start site in the exon and part of the intron corresponding to the region where the 454 reads align.
The RNASeq data shows the gene ZK822.1 is expressed strongly in the Early Embryo life-stage, but the anti-sense region starting at the SL1 site is expressed well in all the other life stages. This looks like a ‘Natural anti-sense’ mechanism of control of gene expression, see: http://en.wikipedia.org/wiki/Cis-natural_antisense_transcript
No very long convincing coding sequence can be formed from these 454 sequences.
So we have 3 sources of evidence for this transcript: 454 reads, TEC-RED SL1 site, Illumina RNASeq reads.
Any ideas what it is?
[Edit]
I’ve just found another example.
The CDS Y57G11C.1 has 27 clustered 454 reads (MM454_contig03575) aligned to its first exon and part of the first intron.
The RNASeq shows a highly expression region in the same place (If you also look at the RNAseq reads that also align to other locations, then the region is expanded at the start). This is only expressed in the ‘MALE6’ library.
No convincing coding sequence can be formed from these 454 sequences.
[Edit]
And another two:
F32F2.3 and F32F2.2 opposite the CDS F32F2.1a/b/c
F32F2.3 has an SL1 on its second exon.