singleton genes

Hi. Does anyone know if there is a list somewhere, or a way to generate a list somehow, of singleton genes in the C. elegans genome? A complete list is not necessary, but I’ll need maybe 50-100 from which to choose. I need singleton genes from other species of nematodes for which there is virtually no genetic information available, and I don’t have the time to empirically determine whether or not paralogues exist in these genomes. The best I can do is hope that a singleton gene in C. elegans will also be a singleton in my species. Thanks.

Hi,
Can you clarify what you mean by “Singleton genes”. I’ve worked with different people in the past who have used this term to mean completely different things!

Ant

Thanks for the interest, Ant. Don’t you just hate the lack of consensus on defintions. By singletons, I mean genes that have no paralogues in the genome. The term “single-copy genes” is perhaps more commonly used, but this term can also have different meanings. I’m looking for loci that I can amplify with sequence-specific primers without fearing that I’m amplifying a differrent gene from a gene family that has a similar sequence. This is for a phylogenetic analysis where you really want to be comparing apples to other apples and not to things that just look like apples. No paralogues, no pseudogenes, no copies on episomes or minichromosomes, no jumping genes, no recently duplicated genes where the sequences are very similar but not quite the same.

There is a file produced for each WormBase release that contains the best blastp hit for each elegans protein -
ftp://ftp.wormbase.org/pub/wormbase/genomes/c_elegans/annotations/blast_hits/best_blastp_hits.WS190.gz

This file is described here - http://www.wormbase.org/wiki/index.php/FAQs#How_can_I_retrieve_the_best_blast_p_scored_homologies_of_worm_genes.3F

Take this file and identify those proteins that dont have an elegans blastp hit. A simple way to do this would be to use the unix grep command ie

grep -v 'WP:' best_blastp_hits.WS190

as elegans hits are preceded with ‘WP:’

This will give you a list of 3818 proteins. The above wiki page describes how you can get from protein ids to genes. Get back to me if you have any problems doing this.

Ant

Thanks a million, Ant. This will save me a lot of time dredging through the literature.