Hi, I have a list of Go Ids and the respective over-represented annotations, but most of them are the the child or sub-divisions of a main/parent term. How to hide them or may be statistically merge then under the main/parent category.
Example Set:
GO    Term
GO:0006351    transcription, DNA-dependent
GO:0032774    RNA biosynthetic process
GO:0016070    RNA metabolic process
GO:0019222    regulation of metabolic process
GO:0050794    regulation of cellular process
GO:0050789    regulation of biological process
GO:0065007    biological regulation
GO:0048522    positive regulation of cellular process
GO:0031323    regulation of cellular metabolic process
GO:0090304    nucleic acid metabolic process
GO:0080090    regulation of primary metabolic process
GO:0060255    regulation of macromolecule metabolic process
GO:0006139    nucleobase-containing compound metabolic process
GO:0048518    positive regulation of biological process
So, the last terms like positive regulation of cellular process , positive regulation of biological process can go under the broad terms like regulation of biological process and regulation of biological process.
Can suggest some tool which can do it textually or graphically.
Cheers
P.S. Revigo can do it, but something else which can be accessed from terminal or R
I would like to know why you want to do that. In general I think the opposite approach is more useful. In that case you would calculate the significant child terms first, remove (prune) them from the tree and then calculate whether the parent term is still significant. We actually have a paper on that, see: http://dx.doi.org/10.1093/bioinformatics/bts366 . Merging everything in the parent terms often leads to conclusions like: "we did a diet study and found that 'metabolism' was affected". Sigh...
Chris, I will read your paper, looks promising. I acknowledge your point, I am practicing gene ontology and had a notion, that only parent terms are important and we are not mostly interested in childs. Just think of the case as "Regulation of Biological processes" followed by "Positive Regulation of Biological processes" and "Negative Regulation of Biological processes", in that case, one would like to see just the parent, isn't it.
Thanks
Cheers
The problem with doing that is how far up the ancestor tree do you go? ReviGo has a nice implementation to get semantic relevance out of the GO structure. I was planning to try to replicate their algorithm in python but I just can't find the time. Another alternative is to maybe use GO slim annotations instead. But I often find that too be too vague.
Hey, I assume one should go up to the main parent term in the tree and then jumps to the next tree. The best way now I think is to cherry-pick the terms one want to see from the basket of highly significant terms and represent them either visually or textually.