Question: Genome Annotation Quality Measure
11
gravatar for Darked89
8.5 years ago by
Darked894.2k
Barcelona, Spain
Darked894.2k wrote:

I have output of several gene prediction programs (using term loosely):

  • de novo predictors (Augustus, GlimmerHMM, Geneid, SNAP, Genscan)
  • RNA-Seq mapped with Tophat and Cufflinks
  • EST sets mapped with PASA (same species)/GMAP (2M+ plant ESTs)
  • 10 protein sets mapped with exonerate

I also got:

  • semi-curated set of 1000 proteins (= non chimeric, non truncated, with correct size and similarity to other plant proteins, but exon borders may be at times wrong/small introns retained), ca 700 of them unique at 50% protein similarity level (uclust)

  • 400+ CEGMA predictions based on HMM profiles of conserved set of genes

So far Augustus with RNA-Seq evidence support is way ahead at predicting sensible genes. I have been comparing numbers of "exons" shared between these sets, and I am puzzled by large numbers of exons unique for almost every method used. While this would be normal for de novo predictors, I was hoping that homology based methods (i.e exonerate protein to genome, GMAP and cufflinks) should overlap way more. I am going to work on improving individual programs results were possible (retraining, better filtering of ESTs/proteins, etc.).

I am looking for to some genome wide measure, telling me how good I am doing, be it for individual gene prediction program or some prediction combiner, as say compared to Arabidopsis and two three other recently annotated plant genomes. Any ideas?

genome gene • 4.1k views
ADD COMMENTlink modified 8.5 years ago by Daniel Standage3.8k • written 8.5 years ago by Darked894.2k
5
gravatar for iw9oel_ad
8.5 years ago by
iw9oel_ad6.0k
iw9oel_ad6.0k wrote:

Annotation Edit Distance devised by Eilbeck et al. might suit your needs, or be a place from which to start. From the paper: "AED is similar to performance measures employed by the gene-prediction community, but takes into account aspects of annotations not well addressed by conventional sensitivity/specificity measures such as alternative splicing."

ADD COMMENTlink written 8.5 years ago by iw9oel_ad6.0k

Great link. I've never seen this paper. I'll need to read it in detail. Probably has some applicability to what I'm currently working on!

ADD REPLYlink written 8.5 years ago by Daniel Standage3.8k

...However, AED also looks at individual annotations rather than giving a global measure, which I think is what is being asked here.

ADD REPLYlink written 8.5 years ago by Daniel Standage3.8k

Well, a global measure is a matter of aggregating the individual measurements. The paper plots cumulative AED for some genome releases over time. Or one might restrict the calculation to a subset of particularly important features for that organism, YMMV.

ADD REPLYlink written 8.5 years ago by iw9oel_ad6.0k

Thanks a lot, I will need some time to digest it.

ADD REPLYlink written 8.5 years ago by Darked894.2k
3
gravatar for Daniel Standage
8.5 years ago by
Daniel Standage3.8k
Davis, California, USA
Daniel Standage3.8k wrote:

There was a thread that talked about this a while back with regards to individual gene models...indeed, you responded to it! (How to compare gene models) So if I understand correctly, you now want to know how to get a higher-level view rather than a per-gene-model comparison?

I spent a bit of time recently looking for software to do this and found little. Consequently, I've spent some time recently working on on a perl application to compare two sets of annotations. One set is treated as a reference, the other is treated as predictions, and it compares exon structure and coding nucleotide agreement.

It's not ready for prime time yet (there are a few small bugs and it still doesn't handle alternative splicing very well), but I've used it to do some comparisons and it has been very helpful. By default it provides a separate comparison for each gene model, but I should be able to force it to the whole sequence all at once (alternative splicing might complicate that, but I may be able to get something to work).

Let me know if you would like to talk details.

ADD COMMENTlink modified 5.9 years ago by Michael Dondrup46k • written 8.5 years ago by Daniel Standage3.8k

re gff comparison tools: in a less hectic moment I am going to list all what I have found (with some comments) on one page. Seems that some ppl use eval from Michael R Brent lab http://mblab.wustl.edu/software/eval/

ADD REPLYlink written 8.5 years ago by Darked894.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 996 users visited in the last hour