EST lacking STOP but has polA present?

This is an interesting Transcript that deserves to be documented.

This EST FN902134 appears to have a polyA tail in the middle of the read, followed by poor quality sequence that does not match the genomic sequence of C .elegans.

Once clipped and marked up, the 3’ sequence is

gatgtaatcggagtaatcgttgaggaaatttctttatggtgcatcgaccaatcattacatgaaattaataaatgtttatacagttaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Purple = PolyA
Green = PolyA signal sequence
Red = 4 nucleotides where the STOP codon should be, but is missing/mutated.

It appears that the corresponding gene resides in region of the genome that has become degraded and so this is likely to be a Pseudogene.

The strange thing is the polyadenylation of the Transcript in the absence of a valid STOP codon.

ID FN902134; SV 1; linear; mRNA; EST; INV; 1334 BP.
XX
AC FN902134;
XX
DT 08-JUN-2010 (Rel. 105, Created)
DT 08-JUN-2010 (Rel. 105, Last updated, Version 1)
XX
DE Caenorhabditis elegans EST, strain N2, clone
DE abi3100_ce1_20060618.G08_12406976_5_GR3OTR_060619.f
XX
KW EST; expressed sequence tag.
XX
OS Caenorhabditis elegans
OC Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea;
OC Rhabditidae; Peloderinae; Caenorhabditis.
XX
RN [1]
RP 1-1
RA Ewing B.;
RT ;
RL Submitted (04-JUN-2010) to the EMBL/GenBank/DDBJ databases.
RL Ewing B., Genome Sciences, Foege S320C, Box 355065, University of
RL WashingtonSeattle, WA 98195-5065, USA.
XX
RN [2]
RA Ewing B., Davis C., Gordon D., Hillier L., Green P.;
RT “C. elegans gene structure determination using RT-PCR testing of Genefinder
RT gene predictions”;
RL Unpublished.
XX
FH Key Location/Qualifiers
FH
FT source 1…1334
FT /organism=“Caenorhabditis elegans”
FT /strain=“N2”
FT /mol_type=“mRNA”
FT /sex=“mixed”
FT /dev_stage=“all”
FT /clone=“abi3100_ce1_20060618.G08_12406976_5_GR3OTR_060619.f”
FT /tissue_type=“whole worm”
FT /PCR_primers=“fwd_name: 12406976f, fwd_seq:
FT tcactccgaatgtttgcga, rev_name: GR3OTR, rev_seq:
FT gctgtcaacgatacgctacgtaacg”
FT /db_xref=“taxon:6239”
XX
SQ Sequence 1334 BP; 429 A; 144 C; 437 G; 324 T; 0 other;
agaggggagg gctaatattt ttgtctaact tatgcaaatc agatgattac gatcaacgca 60
agatacgcaa cggccagcat tcgatataat cgagtaccac agcatgcgca tggagttcaa 120
tcagtcaaaa ttgcaggagg cgtcactttt catccgtcgg ccatgtatac tcgaaactcg 180
aatttcctaa actttgcaaa ttttatcata atgcaacatt caaataagga aattgcaagc 240
atgcggtatg aactgatgaa gaagaaagtg gactcggttc gtctacggaa tttattcgaa 300
attattcgat ggaactttcc ggtgccccga ttgccgcaga acgacatcat acaaaactac 360
aaactttatt ctacacactt aagatgtaat cggagtaatc gttgaggaaa tttctttatg 420
gtgcatcgac caatcattac atgaaattaa taaatgttta tacagttaaa aaaaaaaaaa 480
aaaaaaaaaa aaaaaaaaac ccttttactt gcgttttttt aaaaaaaaaa aaaagaagag 540
aaaaaaaata aaaaaaggga aggaagtaaa tggttaggcc ggaagaggtt aaaaaaaaaa 600
gtggacagtc ggatggagga agaaaggggg agaggtgagg tggaaagaga gagatgtgtt 660
tgtggattat taaaagagga taatataagt tgcttgtggg agtaatatga tgttaggaga 720
ttaaatgaag aaaagagggg agaggtcggt tttaaaagat gaggttgaga ggaagtggca 780
ggagaatgat gtgtggtgat ggacggcagg gggaggggtg aggggtgggg gttaggggag 840
tagaggggga gatggggggg gggagggtgt gaatgtaggg gggctatgag gcaagagggt 900
acgaatggga gcgaaaaggg acgtgctaag gcgtaagatg tatgtatgag gtgggatgtg 960
tctgggtgag gaggcggggt gaatggaggg ggtgaagttg tgtcggggga gtgtgtttgg 1020
gtcggggata ggagagatgt cgatgagggg tgttagagta tgtaatgatg tgttatagat 1080
gttgatgaag tgttgtagtg gtgtgaggtt ataagatagg ggttatggcg ttggaagtcg 1140
tggtattata ggcgaagtgc gaccagacgc cccgtggtat gttatgaata agtattagtc 1200
aggcgaaagg cgaggttgtg tgtgatacgg ataaacgggg ttatgatata gtctcatgta 1260
agtgcgatct taggatacga aggcgtcgga ggtcgggtgt tatgggagga cgcggaggtg 1320
gggtgcgtgg atgc 1334
//