That is an EST from EMBL/DDB/GenBank which aligns to the C.elegans Y79H2A.1a CDS and Y79H2A.1a.1 Transcript.
It does not imply that the sequence of Y79H2A.1a.1 and CN797654 is the same.
For EnsEMBL you also don’t need a WormBase to EnsEMBL id mapping, as I used the WormBase Sequence Names as EnsEMBL GeneIds.
It should be also possible to use WQL to get the matching_CDS for a list of sequence ids (like EMBL accessions) … but I have to look that up in the WormBase Wiki, to get a nicely formatted result set.
This script will do the mapping for you against the WormBase open access data mining server.
Usage:
ensembl2cds.pl accession_ids.txt
–
#!/usr/bin/perl
Author: Todd Harris (harris@cshl.edu)
Please contact me directly if you have questions on how to use this script.
Copyright (@) 2008, Cold Spring Harbor Laboratory
Scriptname: ensembl2CDS.pl
Map ENSEMBL IDs to WormBase CDSs
16 January 2008
use strict;
use Ace;
my $ace = Ace->connect(-host => ‘aceserver.cshl.org’,
-port => ‘2005’) or die “Sorry, I was unable to connect to the remote WormBase database…\n”;
my $version = $ace->status->{database}{version};
open OUT,">$version-ensembl2CDS.txt";
while (<>) {
chomp;
my $sequence = $ace->fetch(Sequence => $_);
my $cds = $sequence->Matching_CDS;
my $gene = $cds->Gene if $cds;