Hello People,

I have a question regarding the statistics depicted in KEGG ontology term depiction. When I annotate assembled transcripts in RNA-seq experiment or predicted genes in Meta-genomic experiments, I often do pathway annotation with KEGG IDs (i.e, KO terms).

Now when we give the statistics of the KO annotation, We write it as

Cellular transport = X

Metabolism=Y so on.

This is where my question is, This value (X or Y) should depict what ?? The number of KO IDs under that category or number of sequences bearing KO IDs from that category which of these makes more sense ???

If you use the first option, you're describing the ontology in a crude way e.g. cellular transport = X would mean that you have X terms related to cellular transport in the ontology (you could restrict this to terms present in your sequence annotations but this won't change the fact that you're still describing the ontology). In the second case, cellular transport = X would mean you have X sequences annotated with "cellular transport" or one of its child terms. In the end, it depends on what you want to describe: the ontology or your sequences.

