collecting ortholog data sets for C.elegans

During the IWM there were some requests to unify the existing ortholog data sets in WormBase and include additional ones.

If you have any ortholog data sets that are not listed below and want to have them included, please send them to us:

  • TreeFAM
  • Inparanoid (2007 Sonnhammer dataset)
  • KOGS
  • EnsEMBL compara
  • orthoMCL (CalTech)
  • ortholog dataset from the C.briggsae paper (Hillier et al)

Ideas and proposals how to improve the integration of the data is welcome.
The current strategy is to merge the dataset based on EnsEMBL IDs (linked by OMIM / UniProt xrefs).
I don’t have a unified scoring system for the confidence, so the number of predictions might do.

Michael

The C.elegans - C.briggsae orthologs are now unified and all different datasets have a respective Analysis connected to them.
It is not viewable on the website’s default gene page, but you can try the Treeview to see it.

As a first use case, I used Orthologs where at least 3 predictions are in consensus (3+ Analysis objects connected) to annotate ortholog C.briggsae genes with a gene name based on C.elegans.

I am trying with less supporting predictions but addition of mutation rate and/or dN/dS for assigning more gene names. I will write in more detail when I have a working method.

We added another ortholog set called OMA from the ETH in Zurich http://www.cbrg.ethz.ch/research/orthologous which is in WS184.
The OMA group will also provide us with an updated version using the last stable C.briggsae data from WS180.

new features for the WS188 release:

  • more OMIM orthologs (this time from EnsEMBL)
  • Inparanoid H.sapiens orthologs for Celegans

As before, they will be in the Ortholog_other tag of the the Gene’s .

If you want , give a shout and we will try to add it.
All EnsEMBL species are already added as Ortholog_other since a while.