Reference allele...?

How is the ‘Reference allele’ for a gene defined? Eg, for lev-1 the reference allele is e211 although, apparently, the underlying molecular defect has not been defined.

From Don Riddle in Worm II:

For all other genes, the reference allele is given in bold type. This is usually the mutation that has been studied in most detail, and a brief description of the relevant mutant phenotype follows.

This is in keeping with my impression; I should perhaps put a disclaimer here, that the following is my impression from what I’ve encountered and inferred from use of the term; I don’t recall if I’ve ever seen a formal definition or discussion of what it means to be the “reference allele”.

The “reference allele” is perhaps something of a bygone notion nowadays, as genes are usually cloned quite quickly and it’s often relatively straightforward to identify or even to generate a molecular null. Back when loci were usually defined genetically, by mutation, and would be described to the community informally or in published papers long before their cloning was even anticipated, it was necessary to name a reference allele, one that would be the most convenient and standard way for everyone to work with that locus, to test whether they had an allele of the same gene or whether mutations in that gene affected their phenotype of interest. The reference allele would therefore meet as many as possible of a few criteria; some of these simply amount to the reference allele being the available allele easiest and most rewarding to work with:

  1. It would be among the first group of mutations identified as alleles of the gene (obviously).
  2. It should have a consistent and robust phenotype, with high penetrance, at least as much so as any other then-available allele.
  3. Something likely to be a genetic null would be nice (so: as strong as any other known allele, not enhanced in trans to a deficiency).
  4. On the other hand, the reference allele might be chosen based on the lack of confounding pleiotropies. If for example one allele is fully penetrant for a visible phenotype and is fertile while a second, stronger allele causes recessive sterility and is often maternally rescued for the visible phenotype, the first allele is often more useful to work with, to compare against, and to do complementation tests with, even though the second allele is far more likely to be a molecular null.
  5. Recessive is good. One of the uses of a reference allele is in complementation testing, so you can see why the allele recessively causing your phenotype of interest is important. Still, sometimes this isn’t the deciding factor, especially none of the alleles that cause the phenotype of interest recessively do so at high penetrance.
  6. You’ll note in the Riddle quote above that the reference allele is the allele likely to have been studied in most detail, and you’ll notice in the Genetics panel of the WormBase gene page for lev-1 that the reference allele e211 is associated with more phenotypes than any other mutation. Obviously, a lot of this is circular (people who encountered this gene in other contexts would test their own alleles and the reference allele), but some of it isn’t. If in characterizing the alleles you have, you discover that allele or one class of alleles causes more phenotypic defects than the others (and if you’re fairly sure this really is an effect of your allele rather than a linked mutation), that’s an excellent reason to name that allele, or one from that class of alleles, as your reference mutation.

Also note the term “canonical allele.”