Question: What is the difference between this Biomart Code, Org.Hs.Db code and SQL code?
gravatar for sinifdosyalari12h
2.1 years ago by
sinifdosyalari12h20 wrote:

My aim is to get all the genes annotated to a Gene Ontology(GO) term in ENTREZ ID form. And currently I have 3 different solutions that achieve this. Below are my example solutions for Human and GO ID: 0005634(nucleus).


ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl") <- getBM(attributes=c('entrezgene'), 
                   filters = 'go', 
                   values = "GO:0005634", 
                   mart = ensembl)

gene_list <- data.frame(mget("GO:0005634", org.Hs.egGO2ALLEGS)[[1]])

running an SQL query on the GO servers

 gene_product.symbol AS gp_symbol
 FROM term
 INNER JOIN association ON
 INNER JOIN gene_product ON (
 INNER JOIN species ON (
 INNER JOIN dbxref ON (
 term.acc = 'GO:0005634'

you can try running the same code in this link . The first two solutions give me entrez ids but the last one gives gene symbol and I think there is no way to get entrez id from gene ontology(please correct me if I am wrong). So I use the mygene library in python to convert the gene symbols to entrez ids. (I search these gene symbols in both the symbols scope and the alias scope).

When I compare the entrez gene ids I obtained with each other I get this:

venn diagram

So my question is:

Why do these return such different results?

Another problem that I have is:

converting all gene symbols into gene ids

Using the mygene python library with Human and Nucleus I am able to get 4955 entrez gene ids and I am left with 980 gene symbols that couldn't be converted into entrez ids. Below are 6 gene symbols that the mygene library is not able to convert into entrez ids

A2RUA4', 'B3KY84', 'ENSP00000368480', 'OTTHUMP00000081030', 'Q14547', 'XP_933608

I mentioned more about that problem in this link but couldn't reach a conclusion.

Any help on my problems would be appreciated and I am also open to new solutions.

sql entrez gene ontology biomart R • 849 views
ADD COMMENTlink modified 2.1 years ago by Jean-Karim Heriche23k • written 2.1 years ago by sinifdosyalari12h20

tagging: Mike Smith

ADD REPLYlink written 2.1 years ago by genomax90k
gravatar for Jean-Karim Heriche
2.1 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

The first one uses Ensembl's biomart whereas the third one directly queries the GO MySQL database. Ensembl has a different set of annotations than GO. It used to be doing its own annotations but maybe now uses GO but with a lag in time which means an older version than the current GO. also uses a specific version of GO (in the current version of the package, GO from 2018-03-28).
In short: Different databases = different results.

ADD COMMENTlink written 2.1 years ago by Jean-Karim Heriche23k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1278 users visited in the last hour