Question: Inconsistencies in Using biomaRt to Retrieve HGNC Names from GO Terms
0
gravatar for JMallory
2.2 years ago by
JMallory0
JMallory0 wrote:

In returning to a project after some time, I noticed that a simple biomaRt script I had written to return all HGNC gene names associated with a list of GO terms provided lists of genes that were often inconsistent with the list obtained by searching directly for the term in the amiGO2 browser.

A representative example of my code:

library(biomaRt)

##provide single GO term as toy example
go_term<-c("GO:0005543")

## query genes associated with GO term
ensembl<-useMart("ensembl")

ensembl <- useDataset("hsapiens_gene_ensembl", mart=ensembl)

geneList<-getBM(attributes= "hgnc_symbol",
                filters=c("go"), 
                values=go_term, 
                mart=ensembl)

I would like to better understand the reason for this inconsistency and know if I should revisit this aspect of my project.

go terms biomart • 996 views
ADD COMMENTlink modified 2.2 years ago by Ben_Ensembl1.0k • written 2.2 years ago by JMallory0

Most likely you're now using a different version of Ensembl than the one you were using the first time.

ADD REPLYlink written 2.2 years ago by Jean-Karim Heriche21k

That is possible. But why, at this moment, can I search this GO ID in the amiGO2 browser and find 375 associated gene products while scripting this search with the provided code yields 85 genes? I think maybe I don't understand the role of the Ensembl mart object here.

ADD REPLYlink written 2.2 years ago by JMallory0
6
gravatar for Ben_Ensembl
2.2 years ago by
Ben_Ensembl1.0k
EMBL-EBI
Ben_Ensembl1.0k wrote:

Hi JMallory,

The BiomaRt query you have used here will retrieve all genes linked specifically with that GO term.

The amiGO2 browser is returning all genes associated with that GO term AND its daughter terms.

You can retrieve the list of genes associated with the GO term AND its daughter terms in Ensembl BiomaRt using the "go_parent_term" filter:

<Dataset name = "hsapiens_gene_ensembl" interface = "default" >
    <Filter name = "go_parent_term" value = "GO:0005543"/>
    <Attribute name = "ensembl_gene_id" />
    <Attribute name = "ensembl_transcript_id" />
</Dataset>

In the web-interface, this filter is found in the 'GENE ONTOLOGY' filter sub-menu. The query you performed originally is equivalent to using the GO term as a filter in the 'Input External References ID list' filter in the 'GENE' filter sub-menu.

Best wishes

Ben Ensembl Helpdesk

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Ben_Ensembl1.0k
2

For ease of copy/paste, the R code you would use is:

geneList <- getBM(attributes = "hgnc_symbol",
                  filters = "go_parent_term", 
                  values = go_term, 
                  mart = ensembl)
ADD REPLYlink written 2.2 years ago by Mike Smith1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1489 users visited in the last hour