Ok, so the title is deliberately provocative but what I want to address is the number of genes in WormBase which don’t look very believable but which will never be removed (under current curation guidelines).
About 20% of all WormBase genes have no transcript evidence at all, and within this set are some that I personally do not believe. For about half of them there is actually evidence against them being real (over 2,000 genes could not be amplified by the Orfeome project and furthermore have no ortholog predicted/detected in C. briggsae). Furthermore, many of this subset have no DNA matches to C. briggsae at all (deduced from lack of WABA alignments).
One example of this is 6R55.2 (http://www.wormbase.org/db/gene/gene?name=WBGene00007069;class=Gene). This short, three-exon gene has fulfils the above criteria and additionally, its first exon is wholly contained within a repeat. The ‘evidence’ for this gene stems from it’s initial genefinder prediction (which is not really evidence) and subsequence protein homology (from BLASTP hits). The best BLASTP hit is to a partial fragment of a whole genome shotgun sequence in a fish and Uniprot has nothing to say at all about this protein. Furthermore, the highest scoring match to the fish protein is actually another C. elegans gene.
Another example is AH10.4 (http://www.wormbase.org/db/gene/gene?name=WBGene00007085;class=Gene). This very short gene is in the intron of another gene on the same strand. There are no abnormal RNAi phenotypes (another trait in common with many of these genes) and the best non-worm match covers just 32 amino acids and has a very low score.
So I believe that there will always be ‘evidence’ from BLAST hits but not evidence that I find believable. Some of these gene predictions have been around for nearly a decade now and have still not had any good evidence produced for their existence. My suggestion would be to remove them from the canonical gene set, remove their proteins from WormPep and maybe introduce a retired/spurious category so that they could appear in a separate track in the genome browser (and maybe exist in a separate blast database). If they are real, then people who work on these genes will probably be swift to let WormBase know!
At some level I think there should be some (evidence-based) justification for every worm gene, or at least a visible scoring system which at least allows people to say ‘yep, this is the worst scoring worm gene’.
Regards,
Keith