Question: Converting HGNC to ensembl and entrez id's using biomart
0
gravatar for akansha.gitanjali
6 months ago by
akansha.gitanjali20 wrote:

I have a vector of gene id's

head(data)
[1] "Ank2"   "Scg2"   "Nefh"   "Sgip1"  "Amph"   "Srcin1"

I used this:

require(biomaRt)
mart=useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
mapping <- getBM(attributes=c("hgnc_symbol","ensembl_gene_id","entrezgene_id"), filters = "hgnc_symbol", mart=mart, values=data, uniqueRows=TRUE, bmHeader = T)
Cache found

 mapping
[1] HGNC symbol                        Gene stable ID                    
[3] NCBI gene (formerly Entrezgene) ID
<0 rows> (or 0-length row.names)

Why does it say cache found. What does it mean?

biomart R • 594 views
ADD COMMENTlink modified 6 months ago by Kevin Blighe71k • written 6 months ago by akansha.gitanjali20
2
gravatar for Kevin Blighe
6 months ago by
Kevin Blighe71k
Republic of Ireland
Kevin Blighe71k wrote:

Hey,

Cache relates to this parameter of getBM():

useCache: Boolean indicating whether the results cache should be used.

Setting to ‘FALSE’ will disable reading and writing of the

cache. This argument is likely to disappear after the cache

functionality has been tested more thoroughly.

It's basically data that is stored on your local drive from when you previously ran biomaRt. It goes without saying that you should restart your R session for every new analysis that you perform in order to clear cache and memory, and avoid re-using old variables that lurk in your workspace..

The problem in this case is that you have mouse gene symbols but are trying to suggest that they are HGNC symbols. HGNC is specific for Homo sapiens (human... us) - you will want MGI (mgi_symbol):

require(biomaRt)

mart <- useMart('ENSEMBL_MART_ENSEMBL', host = 'useast.ensembl.org')
mart <- useDataset('mmusculus_gene_ensembl', mart)

data <- c('Ank2','Scg2','Nefh','Sgip1','Amph','Srcin1')

mapping <- getBM(
  attributes = c('mgi_symbol', 'ensembl_gene_id', 'entrezgene_id'),
  filters = 'mgi_symbol', 
  mart = mart,
  values = data,
  uniqueRows = TRUE,
  bmHeader = T)

mapping

  MGI symbol     Gene stable ID NCBI gene (formerly Entrezgene) ID
1       Amph ENSMUSG00000021314                             218038
2       Ank2 ENSMUSG00000032826                             109676
3       Nefh ENSMUSG00000020396                             380684
4       Scg2 ENSMUSG00000050711                              20254
5      Sgip1 ENSMUSG00000028524                              73094
6     Srcin1 ENSMUSG00000038453                              56013

Kevin

ADD COMMENTlink modified 6 months ago • written 6 months ago by Kevin Blighe71k

Thanks Kevin for pointing out the species error. It works fine now. Now my input file has 7289 genes with some duplicates. After conversion getBM removed the duplicate id's and returned 4731 id's. I do not want it to get rid of the duplicates as I will be combining the output to my original dataset for further downstream analysis. Is there any way to get around that with getBM?

ADD REPLYlink written 6 months ago by akansha.gitanjali20

Did you try uniqueRows = FALSE? Generally, with biomaRt, extra work is required after you perform the initial mapping. You will note that biomaRt does not even return the genes in the same order in which they were submit

For 1-to-1 mapping, org.Mm.eg.db may be a better option. See step 3, here: https://support.bioconductor.org/p/130727/#130733

ADD REPLYlink written 6 months ago by Kevin Blighe71k
1

uniqueRows = FALSE doesn't do it either. But yes, AnnotationDbi package provides the output like I want it. Thank you.

ADD REPLYlink written 6 months ago by akansha.gitanjali20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1556 users visited in the last hour
_