Mapping Entrez gene ids to Ensembl ids
1
0
Entering edit mode
5.1 years ago
Natasha ▴ 40

I am mapping Entrez gene ids to ensembl ids using biomaRt.

library("biomaRt")                                                                                                                    
library("readxl")                                                                                                                     
tbl <- read_excel(input_excel)                                                                                                   
tbl <- tbl["EntrezID"]                                                                                                                
print(tbl, n=Inf)                                                                                                                     
listMarts()                                                                                                                           
ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")                                                                         
filters = listFilters(ensembl)                                                                                                        
entrezgene = tbl                                                                                                                                                                                                                                   
genes <- getBM(                                                                                                                               
filters="entrezgene_id",                                                                                                              
attributes=c("ensembl_gene_id","entrezgene_id"),                                                                                      
values=entrezgene,                                                                                                                    
mart=ensembl)                                                                                                                 
print(genes)

The input is read from the excel here from this study.

I get the following error message

Error in `[[<-.data.frame`(`*tmp*`, vIdx, value = c("26155", "9636", "375790",  :                                                       replacement has 494 rows, data has 6909                                                                                            
 Calls: getBM ... .generateFilterXML -> .splitValues -> [[<- -> [[<-.data.frame                                                        
Execution halted                                                                                                                      
Error in `[[<-.data.frame`(`*tmp*`, vIdx, value = c("26155", "9636", "375790",  :                                                       replacement has 494 rows, data has 6909                                                                                             
Calls: getBM ... .generateFilterXML -> .splitValues -> [[<- -> [[<-.data.frame                                                        
Execution halted

Does 494 rows, data has 6909 this mean only ensembl ids of 494 entrez genes(out of 6909) were mapped? Could someone look into this? I'm not sure how to resolve this error.

R biomaRt gene entrez ensembl • 2.7k views
ADD COMMENT
1
Entering edit mode

Maybe try convert entrez_id from character to nemuric?

ADD REPLY
0
Entering edit mode

Thank you. I added tbl <- mapply(tbl, FUN=as.numeric) before listMart(). The query proceeds without any error. However, I obtain 7720 entries, whereas my input has 6909 entries.

ADD REPLY
1
Entering edit mode

This is normal, many ENTREZ ID map to multiple ENSEMBL ID.

ADD REPLY
0
Entering edit mode

Can you post the actual function that gives the error? Where is your getBM() call?

ADD REPLY
0
Entering edit mode

Sorry, there was a formatting error. Now it's displayed

ADD REPLY
6
Entering edit mode
5.1 years ago
Mike Smith ★ 2.1k

The fundamental issue here is that your entrezgene object is a tibble with one column, where as the getBM() function expects to be passed a vector. The internal code doesn't handle this very well and throws the error you saw.

When you run tbl <- mapply(tbl, FUN=as.numeric) you get back a matrix with a single column, which still isn't what the function expects, but it seems to handle it better. However it is not required to provide the IDs as numbers (in fact they will be converted back to characters inside getBM() anyway).

The better solution is to just select the appropriate column from your tbl an return it as a vector, which you could do via: entrezgene <- tbl$EntrezID

I will update biomaRt to check it's being given the correct type of input, and provide some guidance if that's not the case.

ADD COMMENT

Login before adding your answer.

Traffic: 1183 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6