I have been reading the articles about the prediction of transpliced genes. They all point to a putative transplicing site. Just wonder if someone would briefly let me know how these transplicing sites are annotated.
We got them annotated based evidence from transcripts (mRNA/ESTs/RNASeq). As the trans-spliced leader sequences are constant and got a few variations, it is possible to identify the bit of sequence of a transcript that comes from the trans-splicing event. Due to this it is possible to uniquely identify the base position which acts as splice acceptor site, and which can be mapped back onto the genome. It has been already heavily used in the curation process of gene models (to identify translation starts) and operons (as Caenorhabditis operons tend to contain SL2 trans-spliced genes).
If I am looking for the splice-site motifs, what sequences should I go with?
Following your recommendations, I found the paper of Carol Williams( http://www.ncbi.nlm.nih.gov/pmc/articles/PMC83895/), but it is only related to SL1-operon. How about all SL-1 /SL-2
genes in general?
Generally SL2 trans splicing only occurs in operons, so if you are looking for single gene loci, the generic SL1 sequence GGTTTAATTACCCAAGTTTGAG| should be what you are looking for at the 5-prime end of the transcripts. Mind you, the sequence might be truncated or have sequencing errors in it, depending on the method used … also not all gene are trans-spliced (as far as we know).
Tom Blumenthal did also an computational approach on genomic trans splicing signal described here: http://rnajournal.cshlp.org/content/13/9/1409.full
and came up with various complex models of potential regulatory motifs.
I also wonder if there exists a server/website for the splicing site prediction if I have a genome segment sequence? I know most of the c elegans SL1/SL2 spliced genes have been annotate, but I am working on some non-elegans nemaode.
from the Blumenthal collection: “C. elegans sequences that control trans-splicing and operon pre-mRNA processing” http://www.ncbi.nlm.nih.gov/pubmed/17630324
describing an UR domain that regulates SL2 trans-splicing and an OUT domain regulating the SL1 site.