Question: How To Annotate Many Ensembl Gene Ids To Find The Function Of Each Gene And Involvement With Disease
gravatar for lillo.sim
5.5 years ago by
United Kingdom
lillo.sim40 wrote:


I was wondering if there was an easy was to annotate many ENSEMBL GENE IDs to find the function of each gene and involvement with disease if it has been found (from GWAS or lab studies or observation studies etc..). The problem is that I have many gene ids so I can't look up each one of them individually. Does anybody know if there is a package in R or a software that allows batch query for this information?


gene function annotation • 2.5k views
ADD COMMENTlink modified 5.5 years ago by Istvan Albert ♦♦ 78k • written 5.5 years ago by lillo.sim40

You might look into the NCBI2R package in CRAN. You should be able to use it to get GO terms, GWAS associations, and OMIM information. I should note that you'll need Entrez IDs rather than Ensembl IDs, but that's a trivial conversion.

ADD REPLYlink written 5.5 years ago by Devon Ryan87k

This seems like a nice package, I'll try it out now! thank you

ADD REPLYlink written 5.5 years ago by lillo.sim40

You could use the R interface to biomart to do Ensembl lookups, GO terms, etc. (see How to get Ensembl ID (gene, transcript, protein) mapping information? for links)

ADD REPLYlink written 5.5 years ago by Alex Reynolds27k

Thank you Alex, but biomaRt doesn't seem to contain enough clinical information, unless I am using it incorrectly?

mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
results <- getBM(attributes = c("clinical_significance"), filters = "ensembl_gene_id", values = c("EENSG00000245532"), mart = mart)

and it is empty…

And I have tried any of the following attributes, but it is not really what I am looking for:

getBM(attributes = c("gene_biotype",  "description", "wikigene_description", "pathology"), 
filters = "ensembl_gene_id", values = c("ENSG00000245532"), mart = mart)

It tells me that it is "nuclear paraspeckle assembly transcript 1 (non-protein coding)", and the pathology is empty. I am looking for information of any clinical associations, for example from NHGRI GWAS catalogue or from laboratory findings… Is there a field like "pathology" maybe that is a bit more populated more? Maybe using GO, but how to get the description for these instead of the "go_id" only? Thank you

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by lillo.sim40

You have a typo in that first example (the Gene ID) perhaps that's why you don't get results?

ADD REPLYlink written 5.5 years ago by sarahhunter600
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1650 users visited in the last hour