Question: Use Python (e.g. mygene, biomart, db2db) to convert GO Terms to all genes in pathway
gravatar for jolespin
2.6 years ago by
United States
jolespin100 wrote:

 I want to figure out how to use Python to get all genes associated with a GO Term.  I was trying to Biomart Python API but it's really weird  and Retrieve All Genes Associated With A Go Term is for R.   I'm trying to use mygene but there's no GO for the scopes (the input term) only in the fields (output terms) Gene Id Conversion Tool  I need to do it for ~6000 GO pathways.  I got it to work using bioservices db2db but I get timed out.  Set a pause for 5 seconds between each search and I'm still getting the timeout. ( if anyone wants to know how to use this...really useful)

Does anyone know a tool in Python I can use to do this that won't time me out?  



mygene pathway ontology python gene • 1.0k views
ADD COMMENTlink modified 23 months ago by Newgene340 • written 2.6 years ago by jolespin100

biomartian -d rnorvegicus_gene_ensembl  -i external_gene_name -o go_id | shuf -n 10
Lpcat1  GO:0005509
Klb GO:0005975
LOC498555   GO:0003735
Map3k12 GO:0046777
Hoxb1   GO:0045944
Cir1    GO:0006397
Rhoc    GO:0005525
Casr    GO:0060613
Cib1    GO:1900026
Onecut1 GO:0002064
ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Endre Bakken Stovner840
gravatar for Newgene
23 months ago by
United States
Newgene340 wrote:

You can use mygene Python module to query GO terms for matching genes:

import mygene
mg = mygene.MyGeneInfo()

to query just one GO term:

mg.query('GO:0023026', size=1000)

By default, it returns the first ten matched genes, to get all genes, set a higher size like 1000. Note that there are some GO terms having a large list of genes associated, and you probably don't want to retrieve them all (not that useful anyway), so cap the returned gene list up to 1000 should be a reasonable setting (also avoid timeout).

With mygene, you can also query multiple GO terms in a batch:

mg.querymany(["GO:0023026", "GO:0002503"], scopes='go', size=1000)

In returned result, each gene hit contains a "query" attribute with the value of the corresponding GO term.

And you might want to restrict the number of GO terms in one batch, so that you don't overload the server.

ADD COMMENTlink written 23 months ago by Newgene340
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1589 users visited in the last hour