ID mapping question

nemazur · January 16, 2008, 5:43am

Hi,
I’d like to be able to map a list of ENSEMBL ids to WormBase sequence ids.

I see on the page:
http://www.wormbase.org/db/misc/etree?name=CN797654;class=Sequence

That there is a mapping between :

CN797654

and the matching_CDS: Y79H2A.1a

Is there a query that could do this conversion for a list of ENSEMBL ids?

Thanks,
Lisa

mh6 · January 16, 2008, 9:17am

That is an EST from EMBL/DDB/GenBank which aligns to the C.elegans Y79H2A.1a CDS and Y79H2A.1a.1 Transcript.
It does not imply that the sequence of Y79H2A.1a.1 and CN797654 is the same.

For EnsEMBL you also don’t need a WormBase to EnsEMBL id mapping, as I used the WormBase Sequence Names as EnsEMBL GeneIds.

It should be also possible to use WQL to get the matching_CDS for a list of sequence ids (like EMBL accessions) … but I have to look that up in the WormBase Wiki, to get a nicely formatted result set.

Michael

tharris · January 16, 2008, 5:02pm

Hi Lisa -

This script will do the mapping for you against the WormBase open access data mining server.

Usage:

ensembl2cds.pl accession_ids.txt

–

#!/usr/bin/perl

Author: Todd Harris (harris@cshl.edu)

Please contact me directly if you have questions on how to use this script.

Scriptname: ensembl2CDS.pl

Map ENSEMBL IDs to WormBase CDSs

16 January 2008

use strict;
use Ace;

my $ace = Ace->connect(-host => ‘aceserver.cshl.org’,
-port => ‘2005’) or die “Sorry, I was unable to connect to the remote WormBase database…\n”;
my $version = $ace->status->{database}{version};
open OUT,">$version-ensembl2CDS.txt";
while (<>) {
chomp;
my $sequence = $ace->fetch(Sequence => $_);
my $cds = $sequence->Matching_CDS;
my $gene = $cds->Gene if $cds;

print OUT join("\t",$sequence,$cds,$gene),"\n";

}