As the frozen release WS215 is soon going live, I would like to do some quick comments on the Brugia genemodel update we did:
Based on general feedback, the original geneset was missing quite a few genes and needed improvement, so Erich Schwarz (CalTech) volunteered to predict a new geneset using Augustus (the details will be in the WS215 release notes).
As first version, I merged both sets and removed duplicates. This is what will be in WS215. Overlapping predictions on the same strand were considered one gene with multiple isoforms. That resulted in a a number of genes that is comparable with other nematodes and the general quality of the merged set seems to be slightly higher than the previous version (based on homology and protein motifs).
Due to the nature of the assembly, there are still quite a few truncated gene models, where the N-terminus couldn’t be identified for various reasons (gaps in the assembly being the most common one), so not all genes “start” with a methionine.
For WS216 I plan to remove isoforms that are truncated versions of longer CDSes, which might mean, that we loose/retire about 600 isoforms (mostly TIGR models).
We are also in talks with a local group to get gene models based on RNAseq into a future release.
On a longer timescale, there is a new Brugia assembly coming up (hopefully this year), and there will be new predictions done on that one. I will try to map as many geneids from the old assembly across, but expect quite a few new and lost genes.