Question: Mapping Entrez gene ids to Ensembl ids
0
gravatar for Natasha
5 weeks ago by
Natasha30
Natasha30 wrote:

I am mapping Entrez gene ids to ensembl ids using biomaRt.

library("biomaRt")                                                                                                                    
library("readxl")                                                                                                                     
tbl <- read_excel(input_excel)                                                                                                   
tbl <- tbl["EntrezID"]                                                                                                                
print(tbl, n=Inf)                                                                                                                     
listMarts()                                                                                                                           
ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")                                                                         
filters = listFilters(ensembl)                                                                                                        
entrezgene = tbl                                                                                                                                                                                                                                   
genes <- getBM(                                                                                                                               
filters="entrezgene_id",                                                                                                              
attributes=c("ensembl_gene_id","entrezgene_id"),                                                                                      
values=entrezgene,                                                                                                                    
mart=ensembl)                                                                                                                 
print(genes)

The input is read from the excel here from this study.

I get the following error message

Error in `[[<-.data.frame`(`*tmp*`, vIdx, value = c("26155", "9636", "375790",  :                                                       replacement has 494 rows, data has 6909                                                                                            
 Calls: getBM ... .generateFilterXML -> .splitValues -> [[<- -> [[<-.data.frame                                                        
Execution halted                                                                                                                      
Error in `[[<-.data.frame`(`*tmp*`, vIdx, value = c("26155", "9636", "375790",  :                                                       replacement has 494 rows, data has 6909                                                                                             
Calls: getBM ... .generateFilterXML -> .splitValues -> [[<- -> [[<-.data.frame                                                        
Execution halted

Does 494 rows, data has 6909 this mean only ensembl ids of 494 entrez genes(out of 6909) were mapped? Could someone look into this? I'm not sure how to resolve this error.

ensembl entrez biomart R gene • 142 views
ADD COMMENTlink modified 5 weeks ago by Mike Smith1.4k • written 5 weeks ago by Natasha30
1

Maybe try convert entrez_id from character to nemuric?

ADD REPLYlink written 5 weeks ago by MatthewP260

Thank you. I added tbl <- mapply(tbl, FUN=as.numeric) before listMart(). The query proceeds without any error. However, I obtain 7720 entries, whereas my input has 6909 entries.

ADD REPLYlink written 5 weeks ago by Natasha30
1

This is normal, many ENTREZ ID map to multiple ENSEMBL ID.

ADD REPLYlink written 5 weeks ago by MatthewP260

Can you post the actual function that gives the error? Where is your getBM() call?

ADD REPLYlink written 5 weeks ago by Haci120

Sorry, there was a formatting error. Now it's displayed

ADD REPLYlink written 5 weeks ago by Natasha30
6
gravatar for Mike Smith
5 weeks ago by
Mike Smith1.4k
EMBL Heidelberg / de.NBI
Mike Smith1.4k wrote:

The fundamental issue here is that your entrezgene object is a tibble with one column, where as the getBM() function expects to be passed a vector. The internal code doesn't handle this very well and throws the error you saw.

When you run tbl <- mapply(tbl, FUN=as.numeric) you get back a matrix with a single column, which still isn't what the function expects, but it seems to handle it better. However it is not required to provide the IDs as numbers (in fact they will be converted back to characters inside getBM() anyway).

The better solution is to just select the appropriate column from your tbl an return it as a vector, which you could do via: entrezgene <- tbl$EntrezID

I will update biomaRt to check it's being given the correct type of input, and provide some guidance if that's not the case.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Mike Smith1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2312 users visited in the last hour