Question: How to look up GO terms associated to a certain organism?
0
gravatar for Diego
2.1 years ago by
Diego40
Diego40 wrote:

I want to get all the GO terms under the biological process GO domain for any organism -right now, I need all the biological process associated to all gene products of E. coli (non-IEA ideally).

I'm new in the field of bioinformatics (I'm a software engineer by education) and right now I'm struggling with the great number of interconnected databases I'm finding.

At first, I tried with the R library biomaRt[1], but I discovered they don't support the Ensembl Bacteria database anymore.

Then I tried to do it by hand using AmiGO (actually GOlr[2]), but then I noticed that GO Central focused on human health-related annotations[3], so I supposed I would loose a lot of useful annotations for non-human species. After that I wanted to try QuickGO, but then I thought that with so many databases, repositories and tools around, it's a bit hard to know how exhaustive my request will be.

Should I stick with Ensemble databases and ditch biomaRt as a library? Or using a combination of GOlr(AmiGO) and GAnnotation(QuickGO) enough?

References

ADD COMMENTlink modified 2.1 years ago by EagleEye6.2k • written 2.1 years ago by Diego40
5
gravatar for EagleEye
2.1 years ago by
EagleEye6.2k
Sweden
EagleEye6.2k wrote:

Check out Gene Set Clustering based on Functional annotation (GeneSCF)

Please use recent version of GeneSCF for successful results.

You can extract all GO terms from gene ontology for your desired organism (Example below, Escherichia coli [ecocyc]) using simple command from GeneSCF,

(Complete GO for E.coli)

./prepare_database -db=GO_all -org=ecocyc

OR (Biological Process)

./prepare_database -db=GO_BP -org=ecocyc

OR (Molecular Functions)

./prepare_database -db=GO_MF -org=ecocyc

OR (Cellular Components)

./prepare_database -db=GO_CC -org=ecocyc

Advantages

  • Real-time analysis, do not have to depend on enrichment tools to get updated.

  • Easy for computational biologists to integrate this simple tool with their NGS pipeline.

  • GeneSCF supports more organisms.

  • Enrichment analysis for Multiple gene list in single run.

  • Enrichment analysis for Multiple gene list using Multiple source database (GO,KEGG, REACTOME and NCG) in single run.

  • Download complete GO terms/Pathways/Functions with associated genes as simple table format in a plain text file (Check "Two step process" below in "GeneSCF USAGE" section).

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by EagleEye6.2k

You have an awesome tool right there, and sounds exactly like something I would use. Reading more about your work, I stumbled upon this. Where are you exactly getting the data? Are you "just" using GO,KEGG, REACTOME and NCG? Also, why is it REACTOME_web has more results than GeneSCF if the last uses the first as a source?

Thanks for your answer, I'm really excited to test this tool, sounds closer to the Holy Grail I was searching for.

ADD REPLYlink written 2.1 years ago by Diego40

Thanks for your valuable feedback.

  • Current version of GeneSCF is restricted to access data only from Gene Ontology, KEGG, Reactome and NCG. Future versions may be have improvements to support more databases (with more surprises).

  • It is possible that Reactome FTP provides earlier version than on ReactomeWeb. As we described in the publication, GeneSCF access CURRENT version of Reactome dataset from 'http://www.reactome.org/download/current/ReactomePathways.gmt.zip' (It is possible that ReactomeWeb uses the recent regularly and minimally altered dataset but provides only processed stable version via FTP).

For more information on GeneSCF read the publication / Ask your questions on Biostar / Email (santhilal.subhash@gu.se).

ADD REPLYlink written 2.1 years ago by EagleEye6.2k
0
gravatar for karl.stamm
2.1 years ago by
karl.stamm3.4k
United States
karl.stamm3.4k wrote:

The Gene Ontology is an ontology of genes. It annotates molecular functions (agnostic across species), cellular components (also agnostic across species) and biological function (somewhat to mostly agnostic across species). If you need something specific to EColi, take a look at http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki

ADD COMMENTlink written 2.1 years ago by karl.stamm3.4k

Thanks, I'll take a look to website! I know GO is species-agnostic by design, but what I need is to get a list as exhaustive as possible of all gene products annotated to the GO term "biological process" (GO:0008150) for a certain organism. My problem is I don't know where to ask, since I don't know how exhaustive the results I'm getting are, e.g., here I'm consulting all gene products annotated to GO:0008150 and filtering by both E. Coli and E. Coli K-12 (IDs 562 and 83333 respectively). Then a quick processing of the data would bring me all biological process GO terms associated to E.Coli, but given AmiGO, QuickGO and Ensembl exist, I don't know if I'm missing more databases to look-up.

I'll check the wiki, probably it will be useful to understand more this particular case.

ADD REPLYlink written 2.1 years ago by Diego40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour