Question

Converting HGNC to ensembl and entrez id's using biomart

0

Entering edit mode

3.7 years ago

akansha.gitanjali ▴ 30

I have a vector of gene id's

head(data)
[1] "Ank2"   "Scg2"   "Nefh"   "Sgip1"  "Amph"   "Srcin1"

I used this:

require(biomaRt)
mart=useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
mapping <- getBM(attributes=c("hgnc_symbol","ensembl_gene_id","entrezgene_id"), filters = "hgnc_symbol", mart=mart, values=data, uniqueRows=TRUE, bmHeader = T)
Cache found

 mapping
[1] HGNC symbol                        Gene stable ID                    
[3] NCBI gene (formerly Entrezgene) ID
<0 rows> (or 0-length row.names)

Why does it say cache found. What does it mean?

R biomart • 4.0k views

ADD COMMENT • link updated 3.7 years ago by Kevin Blighe 87k • written 3.7 years ago by akansha.gitanjali ▴ 30

score 2 · Accepted Answer · 2020-08-18

Hey,

Cache relates to this parameter of getBM():

useCache: Boolean indicating whether the results cache should be used.

Setting to ‘FALSE’ will disable reading and writing of the

cache. This argument is likely to disappear after the cache

functionality has been tested more thoroughly.

It's basically data that is stored on your local drive from when you previously ran biomaRt. It goes without saying that you should restart your R session for every new analysis that you perform in order to clear cache and memory, and avoid re-using old variables that lurk in your workspace..

The problem in this case is that you have mouse gene symbols but are trying to suggest that they are HGNC symbols. HGNC is specific for Homo sapiens (human... us) - you will want MGI (mgi_symbol):

require(biomaRt)

mart <- useMart('ENSEMBL_MART_ENSEMBL', host = 'useast.ensembl.org')
mart <- useDataset('mmusculus_gene_ensembl', mart)

data <- c('Ank2','Scg2','Nefh','Sgip1','Amph','Srcin1')

mapping <- getBM(
  attributes = c('mgi_symbol', 'ensembl_gene_id', 'entrezgene_id'),
  filters = 'mgi_symbol', 
  mart = mart,
  values = data,
  uniqueRows = TRUE,
  bmHeader = T)

mapping

  MGI symbol     Gene stable ID NCBI gene (formerly Entrezgene) ID
1       Amph ENSMUSG00000021314                             218038
2       Ank2 ENSMUSG00000032826                             109676
3       Nefh ENSMUSG00000020396                             380684
4       Scg2 ENSMUSG00000050711                              20254
5      Sgip1 ENSMUSG00000028524                              73094
6     Srcin1 ENSMUSG00000038453                              56013

Kevin