My aim is to get all the genes annotated to a Gene Ontology(GO) term in ENTREZ ID form. And currently I have 3 different solutions that achieve this. Below are my example solutions for Human and GO ID: 0005634(nucleus).
library(biomaRt) ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl") gene.data <- getBM(attributes=c('entrezgene'), filters = 'go', values = "GO:0005634", mart = ensembl)
library(org.Hs.eg.db) gene_list <- data.frame(mget("GO:0005634", org.Hs.egGO2ALLEGS)[]) print(gene_list)
running an SQL query on the GO servers
SELECT gene_product.symbol AS gp_symbol FROM term INNER JOIN association ON term.id=association.term_id) INNER JOIN gene_product ON (association.gene_product_id=gene_product.id) INNER JOIN species ON (gene_product.species_id=species.id) INNER JOIN dbxref ON (gene_product.dbxref_id=dbxref.id) INNER JOIN db ON (association.source_db_id=db.id) WHERE term.acc = 'GO:0005634' AND species.ncbi_taxa_id="9606";
you can try running the same code in this link . The first two solutions give me entrez ids but the last one gives gene symbol and I think there is no way to get entrez id from gene ontology(please correct me if I am wrong). So I use the mygene library in python to convert the gene symbols to entrez ids. (I search these gene symbols in both the symbols scope and the alias scope).
When I compare the entrez gene ids I obtained with each other I get this:
So my question is:
Why do these return such different results?
Another problem that I have is:
converting all gene symbols into gene ids
Using the mygene python library with Human and Nucleus I am able to get 4955 entrez gene ids and I am left with 980 gene symbols that couldn't be converted into entrez ids. Below are 6 gene symbols that the mygene library is not able to convert into entrez ids
A2RUA4', 'B3KY84', 'ENSP00000368480', 'OTTHUMP00000081030', 'Q14547', 'XP_933608
I mentioned more about that problem in this link but couldn't reach a conclusion.
Any help on my problems would be appreciated and I am also open to new solutions.