There seems to be something odd happening with the three ESTs:
yk1279h06.5
yk1397e04.5
yk1518a07.5
The first 275 or so bases of these ESTs align perfectly to the first two exons of Y92H12BL.4 while the part after approximately base 275 aligns perfectly to the first two exons of Y92H12BL.1.
After a BLAT alignment, about 53% of the length of these ESTs match with the genome at Y92H12BL.1 but only about 43% of the ESTs match with the genome at Y92H12BL.4, which is why the best match of the ESTs are displayed on the genome viewer at the 5’ end of Y92H12BL.1.
This is not the first EST I have seen that appears to be a chimaera of two nearby gene transcripts, but it is the first time that I have seen it happen three times nearly identically.
It is always possible that the same chimaeric transcript was sequenced three times under different names in error.
There is a large ORF across the full length of these ESTs giving a protein product which has good matches to the first half of “CDK5 regulatory subunit associated protein 1-like” in vertebrates.
`
gi|45361233|ref|NP_989194.1| CDK5 regulatory subunit associated protein 1-like 1 [Xenopus
tropicalis]
gi|38649005|gb|AAH63205.1| CDK5 regulatory subunit associated protein 1-like 1 [Xenopus
tropicalis]
Length=553
Score = 184 bits (467), Expect = 2e-45, Method: Composition-based stats.
Identities = 94/198 (47%), Positives = 135/198 (68%), Gaps = 11/198 (5%)
Query 6 DIEDIVGR—GPVGSRDANE-IKIRTRKQVPKEQQPDDANVDSMVPGVGQKVWVRTWGC 61
DIEDIV P ++A + I R RK+ + Q ++ DS +PG QK+W+RTWGC
Sbjct 11 DIEDIVSATDPKPHDRQNARQNIVPRARKRNKNKIQEEEPPADSTIPGT-QKIWIRTWGC 69
Query 62 SHNTSDSEYMSGLLQQAGYDVVKEPETAQVWVLNSCTVKTPSEQQANNLVVQGQEQGKKI 121
SHN SD EYM+G L GY + ++PE A +W+LNSCTVK+P+E N + + QE KK+
Sbjct 70 SHNNSDGEYMAGQLAAYGYSITEQPEQADLWLLNSCTVKSPAEDHFRNSIKKAQEANKKV 129
Query 122 IMAGCVSQAAPSEPWLQNVSIVGVKQIDRIVEVVGETLKGNKVRLLTRNRPD------AV 175
+++GCV QA P + +++ +SI+GV+QIDR+VEVV ET+KG+ VRLL + + + A
Sbjct 130 VLSGCVPQAQPRQEYMKGLSIIGVQQIDRVVEVVEETIKGHSVRLLGQKKDNGKRLGGAR 189
Query 176 LSLPKMRKNELIEVLSIS 193
L LPK+RKN LIE++SI+
Sbjct 190 LDLPKIRKNPLIEIISIN 207
`
The Y92H12BL.1 protein product matches many species “CDK5 regulatory subunit associated protein 1-like” very well:
`
gi|39598390|emb|CAE69083.1| Hypothetical protein CBG15101 [Caenorhabditis briggsae]
Length=437
Score = 694 bits (1792), Expect = 0.0, Method: Composition-based stats.
Identities = 333/356 (93%), Positives = 345/356 (96%), Gaps = 0/356 (0%)
Query 1 MAGCVSQAAPSEPWLQNVSIVGVKQIDRIVEVVGETLKGNKVRLLTRNRPDAVLSLPKMR 60
MAGCVSQAAPSEPWLQNVSIVGVKQIDRIVEVV ETLKGNKVRLLTRNRPDA+LSLPKMR
Sbjct 80 MAGCVSQAAPSEPWLQNVSIVGVKQIDRIVEVVEETLKGNKVRLLTRNRPDALLSLPKMR 139
Query 61 KNELIEVLSISTGCLNNCTYCKTKMARGDLVSYPLADLVEQARAAFHDEGVKELWLTSED 120
KNELIEVLSISTGCLNNCTYCKTKMARGDLVSYPL DLVEQARAAFHDEGVKELWLTSED
Sbjct 140 KNELIEVLSISTGCLNNCTYCKTKMARGDLVSYPLEDLVEQARAAFHDEGVKELWLTSED 199
Query 121 LGAWGRDIGLVLPDLLRELVKVIPDGSMMRLGMTNPPYILDHLEEIAEILNHPKVYAFLH 180
LGAWGRDI LVLPDLL LVKVIPDG MMRLGMTNPPYILDHLEEIAEILN+PKVYAFLH
Sbjct 200 LGAWGRDINLVLPDLLNALVKVIPDGCMMRLGMTNPPYILDHLEEIAEILNNPKVYAFLH 259
Query 181 IPVQSASDAVLNDMKREYSRRHFEQIADYMIANVPNIYIATDMILAFPTETLEDFEESME 240
IPVQSASDAVL DMKREYSRRHFEQIADYMI NVPNIYIATDMILAFPTETLEDFEESME
Sbjct 260 IPVQSASDAVLTDMKREYSRRHFEQIADYMIENVPNIYIATDMILAFPTETLEDFEESME 319
Query 241 LVRKYKFPSLFINQYYPRSGTPAARLKKIDTVEARKRTAAMSELFRSYTRYTDERIGELH 300
LVRKYKFPSLFINQYYPRSGTPAARLKKIDTVEARKRTAAMSELFRSYTR+T++RIGE+H
Sbjct 320 LVRKYKFPSLFINQYYPRSGTPAARLKKIDTVEARKRTAAMSELFRSYTRFTEDRIGEIH 379
Query 301 RVLVTEVAADKLHGVGHNKSYEQILVPLEYCKMGEWIEVRVTAVTKFSMISKPASI 356
VLVTE+AADK+HGVGHNKSYEQILVPLE+CKMGEWIEVR+T+VTKFSMIS P S+
Sbjct 380 NVLVTEIAADKIHGVGHNKSYEQILVPLEHCKMGEWIEVRITSVTKFSMISTPTSL 435
`
while the Y92H12BL.4 protein product appears to be a merger of two types of protein:
`
gi|39598389|emb|CAE69082.1| Hypothetical protein CBG15100 [Caenorhabditis briggsae]
Length=646
Score = 202 bits (514), Expect = 1e-50, Method: Composition-based stats.
Identities = 119/132 (90%), Positives = 124/132 (93%), Gaps = 3/132 (2%)
Query 131 NEKK—RRGWELLAIVLAFIFPTEAINEKLNEFLNKHLDPIFDLPEVSTSYFSQQCIKR 187
NEK RRGWELL IVLAFIFPTEAI+EKLN+FLNKHLD IFDLPEVSTSYFSQQC+KR
Sbjct 365 NEKPDSLRRGWELLTIVLAFIFPTEAISEKLNDFLNKHLDSIFDLPEVSTSYFSQQCLKR 424
Query 188 LSKVIARLKPSLSTIQETKIHIFRPPLFSASLEELMQMQSEKFpelklpwllttliellY 247
LSKVI RLKPSL +IQETKIHIFRPPLFSASLEELMQMQSEKFPELKLPWLLTTLIELLY
Sbjct 425 LSKVITRLKPSLQSIQETKIHIFRPPLFSASLEELMQMQSEKFPELKLPWLLTTLIELLY 484
Query 248 QSGGRRTEGLFR 259
QSGGRRTEG+FR
Sbjct 485 QSGGRRTEGIFR 496
`
and
`
gi|39598390|emb|CAE69083.1| Hypothetical protein CBG15101 [Caenorhabditis briggsae]
Length=437
Score = 88.2 bits (217), Expect = 3e-16, Method: Composition-based stats.
Identities = 37/41 (90%), Positives = 40/41 (97%), Gaps = 0/41 (0%)
Query 40 DSMVPGVGQKVWVRTWGCSHNTSDSEYMSGLLQQAGYDVVK 80
DSMVPGVGQKVWVRTWGCSHNTSDSEYM+GLL +AGYDV+K
Sbjct 1 DSMVPGVGQKVWVRTWGCSHNTSDSEYMAGLLHKAGYDVLK 41
`
Y92H12BL.4 should almost certainly be split into two separate genes.
There is mass spectroscopy evidence from:
MSP:EQQPDDANVDSMVPGVGQK
on the second exon of Y92H12BL.4 that the genomic region aligned to by the three ESTs produces a protein product.
Gary