Bulk Download SNPs with Alleles

I’m trying to download all the SNPs in C. elegans but am unable to find a source that has the N2 and non-N2 alleles included aside from the individual pages for SNPs where such information is listed in the ‘Molecular Details’ (e.g. http://www.wormbase.org/db/gene/variation?name=pas1;class=Variation). I tried all the WormMart options and none of them provided the alleles. The GFF files do not appear to have that information either. I am considering parsing the webpages directly but am hoping there is a more straight forward option. Thank you for your help.

if you want to download all N2 alleles induced in C. elegans laboratories versus single nucleotide polymorphisms identified in wild isolates, i was able to download the alleles by searching Wormbase Build 195, Category “Variations” and filtering the results by selecting every category of non-SNP in the field “Isolation History>>Isolation Method”. This returned 24715 alleles. Searching the same category for “SNPs” only returned, 104684 SNPs, rather than N2 alleles.

you can then use additional fields to return all the categories seen on the variation page.

hope this helps…

Hi,
To download all SNPs you can use WormMart.

Here is the query which you can modify to you precise requirements
http://tinyurl.com/elegans-SNPs

As far as I know all of the SNPs are differences between N2 and the specified strain

Ant

I think I must not have been clear about what my issue is. I want to know that in N2 a given SNP is a ‘G’ and in CB4858 it is a ‘T’. This information seems to only be available in the ‘Molecular Details’ on a specific page for a SNP (e.g. http://www.wormbase.org/db/gene/variation?name=pas1;class=Variation). That is very useful if you just want to know that information for one SNP but I want to download a table of SNPs with that information included.

I tried selecting every single field of output on WormMart in the Variation section and did not see this information anywhere. Either I am missing something about how WormMart works or it currently is not possible to get this information through WormMart. If I am correct that it is not possible, I am hoping someone can direct me to a place where I can download this information.

Thank you.

You’re right - I’m afraid you can’t get that data from WormMart. It’s pretty easy to get from the acedb database though so I’ve put a file on the Sanger FTP site with the results.

ftp://ftp.sanger.ac.uk/pub2/wormbase/WS201/genomes/c_elegans/annotation/WS201_elegans_SNP_changes.txt.gz

It has four columns - SNP name; N2 nucleotide; mutant strain nucleotide; Strain (where available) eg

bsP1 g t CB4856
bsP2 t c CB4856

All C. elegans SNPs where this detail is annotated are included.

Ant

Ant, thank you so much for your help with this. I have one more request to make this table fully usable. I’m not currently working with WS201 and the tool that maps releases doesn’t go up to WS201 (WD199 is the highest). Would it be possible to regenerate this table with coordinates from WS199 or earlier or tell me how I can map these coordinates to an earlier release?

Thank you!

Hi,
The data in that file is independant of the genomic coordinates eg. bsP1 is g → t whatever version of the genome you’re using. When genomic sequence changes are made we remap all of the variations to take this in to account.

If you do need to remap something between releases there are sequence changes between WS199 and WS201 so coords will be different. The most recent remapping files (including the embryonic WS203!) are here.
ftp://ftp.sanger.ac.uk/pub/wormbase/Remap-between-versions/remap.tar.bz2

Ant

Thank you Ant. I somehow overlooked WS200 and WS201 in the remap directory.

Hello Ant is there anyway you could supply the query you used to obtain this data? I would like to grab the latest set of SNPs for CB4856 for WS209 along with the WS209 coordinates.

Many thanks!!

I used the acedb table-maker query tool to produce this file. Unfortunately, there is no web version of this despite several abortive attempts over the years. However, it is possible to embed a table-maker query in a perl script. You need to have acedb installed (free from www.acedb.org) and the Ace perl libraries, which you can get from CPAN. If this sounds like a route you’re comfortable with let me know and I’ll write some guidelines on the wiki.

Ant

I am interested in it - yes please write some guidelines!

I’d also be interested in downloading these tables.

Thanks,
Harold