Question: Is There A Standard Format For Go Term Enrichment Results?
gravatar for Chris Mungall
7.8 years ago by
Chris Mungall320
Chris Mungall320 wrote:

I am fairly certain there is no such standard, but I'm also fairly certain some other people must have thought about this.

One advantage of a standard format is that it would simplify the running of multiple enrichment tools in parallel and comparing or combining results. This is particularly useful to us within the GO consortium, as we would like to compare analyses between newer/older versions of the ontology and annotations. A more ambitious aim is for publications that include GO enrichment results to provide these in a standard format, to simplify replicating results.

Note that it would not be necessary for all tools to be conformant in order for the standard to be successful. Converters could be provided to rewrite the ad-hoc output of heterogeneous tools to the standard form. However, it would help to have buy-in from some of the more popular tools.

I have listed some desiderata for such a standard:

  • An abstract specification with different serializations for different purposes (tabular, JSON, XML, RDF)
  • Extensibility
  • Use of ontology terms in place of free text to describe algorithms, parameters and data processing (for example, the Ontology for Biomedical Investigations (OBI) has a rich collection of these)

Minimal information:

  • Tool name + algorithm + version
  • Input token list + token type (e.g. symbol)
  • Background token list + token type (if provided)
  • Token-gene ID mapping (plus unmatched tokens)
  • Algorithm parameters (cut-offs, algorithm selected, etc)
  • Ontology id + version
  • gene association set id / species + version
  • List of results - for each result:
    • term ID
    • optional term metadata
    • list of gene IDs (+ optional gene metadata)
    • scoring metadata (p-vals, rank, etc)

Optional information:

  • Unique identifier/URI for the results
  • Metadata on input token set (e.g. "genes up-regulated in diabetes")
  • graphical output

Is is this of general interest? If so, does the above sound like a good start, and what would be an appropriate forum for future discussions? Is there an existing tool whose output might be a good candidate for standardization?

gene function format enrichment • 3.3k views
ADD COMMENTlink modified 5.9 years ago by Biostar ♦♦ 20 • written 7.8 years ago by Chris Mungall320

Good point and interesting paper. Yes, my list is biased towards simple gene lists. I think we would probably want a fairly generic core and extensions for GSEA, genomic intervals, etc.

ADD REPLYlink written 7.8 years ago by Chris Mungall320

Interesting topic, and clearly a need for this. Another piece of meta-data that would be good to capture is if the analysis is done at the gene list or genomic interval level, and if the latter if any corrections for genomic structure are applied, e.g.

ADD REPLYlink written 7.8 years ago by Casey Bergman18k
gravatar for Qdjm
7.8 years ago by
Qdjm1.9k wrote:

Hi Chris,

Good idea. One important thing that appears to be missing from your minimal information is the subset of GO terms tested. Often people only test for enrichment of GO terms at a given level in the hierarchy or with a minimum number of associations.

ADD COMMENTlink written 7.8 years ago by Qdjm1.9k

Good point. As well as a subset of terms, we can also imagine a subset of relationships. We can even imagine a superset of terms, where dynamic grouping classes are created using other ontologies.

I'm not sure what the best solution is. I can imagine for advanced cases we might want to bundle the entire application ontology used. But this would be overkill for the more basic scenario.

ADD REPLYlink written 7.8 years ago by Chris Mungall320

The basic scenario requires representing the subset of the terms. This is standard practice. I haven't seen any cases of only using a subset of the relationships and can think of only one case in which terms have been grouped together.

ADD REPLYlink written 7.8 years ago by Qdjm1.9k
gravatar for Allpowerde
7.8 years ago by
Allpowerde1.2k wrote:

I'm all for standards, but it probably goes as it always has: the (accidental) format of the most heavily used program is adopted as the standard. So why not contact the developers of these programs and get their opinion (and cooperation):

ADD COMMENTlink written 7.8 years ago by Allpowerde1.2k

What about DAVID?

ADD REPLYlink written 7.8 years ago by Qdjm1.9k

Not to leave anyone out - DAVID and many others are listed here:

(let us know if your favourite tool isn't there)

You're right about how bioinformatics standards typically evolve - hopefully we can be a little more proactive here.

We should absolutely contact the developers of these tools. I wanted to check first there wasn't some existing effort. I imagine the next step will be to take this discussion to an (open) google group or something similar.

ADD REPLYlink written 7.8 years ago by Chris Mungall320
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 782 users visited in the last hour