Question: Get gene symbols from gene ids for mouse using BioMart
3
gravatar for nikitavlassenko
3.0 years ago by
nikitavlassenko70 wrote:

I am trying to get gene symbols for gene ids that I got for mouse datasets. Gene ids look like that: 0610009B22Rik. The code that I am trying to utilize is the following one:

ensembl <- useMart("ensembl", dataset="mmusculus_gene_ensembl")
mouse_gene_ids <- dataset[, 1]
foo <- getBM(attributes=c('ensembl_gene_id',
                      'external_gene_name'),
         filters = 'genedb',
         values = mouse_gene_ids,
         mart = ensembl)

I am getting zero results as an output after the query runs. I guess filters parameter is wrong. Any suggestions would be greatly appreciated.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by nikitavlassenko70
9
gravatar for Mike Smith
3.0 years ago by
Mike Smith1.6k
EMBL Heidelberg / de.NBI
Mike Smith1.6k wrote:

The filter you need is mgi_symbol e.g.

library(biomaRt)

ensembl <- useMart("ensembl", dataset="mmusculus_gene_ensembl")
mouse_gene_ids  <- "0610009B22Rik"

foo <- getBM(attributes=c('ensembl_gene_id',
                          'external_gene_name'),
             filters = 'mgi_symbol',
             values = mouse_gene_ids,
             mart = ensembl)

Here's the result:

> foo
     ensembl_gene_id external_gene_name
1 ENSMUSG00000007777      0610009B22Rik

I find the best way to choose the correct filter is to start with the Ensembl BioMart web interface, use the examples in the Filters -> external references ID list dropdown list to find the format I'm using, and then hit the XML button near the top. This will let you see the filter name required by biomaRt

ADD COMMENTlink written 3.0 years ago by Mike Smith1.6k
library(biomaRt)
listMarts()
ensembl=useMart("ensembl")
datasets <- listDatasets(ensembl)
head(datasets)
ensembl = useDataset("mmusculus_gene_ensembl", mart = ensembl)
entrzID=c("14455", "80904", "94275")
filters = listFilters(ensembl)
filters[1:50,]
getBM(attributes = c("ensembl_gene_id", "external_gene_name"), filters = "mgi_symbol", values = entrzID, mart = ensembl)

output:

[1] ensembl_gene_id    external_gene_name
<0 rows> (or 0-length row.names)

Why can't I get any gene symbol

ADD REPLYlink modified 18 months ago • written 18 months ago by Kai_Qi100
1

Perhaps you want to try the entrezgene_id filter instead?

getBM(attributes = c("ensembl_gene_id", "external_gene_name"), 
       filters = "entrezgene_id", 
       values = entrzID, 
       mart = ensembl)

     ensembl_gene_id external_gene_name
1 ENSMUSG00000040415               Dtx3
2 ENSMUSG00000025151             Maged1
ADD REPLYlink written 18 months ago by Mike Smith1.6k

Yes. I tried it and it works.

Thank you so much for the help!

ADD REPLYlink written 18 months ago by Kai_Qi100

Hello, I tried to follow the previous posts and actually everything worked but I did not get anything back as result. My code below: library(biomaRt) ensembl <- useMart("ensembl",dataset="mmusculus_gene_ensembl") genes_ids <- c('ENSMUSG00000051951.5', 'ENSMUSG00000025900.12', 'ENSMUSG00000025902.13') gs_heatdata <- getBM(attributes = c("external_gene_name"), filters = "mgi_symbol", values = genes_ids, mart = ensembl)

ADD REPLYlink modified 21 hours ago • written 21 hours ago by tommaso.gastaldi0
1

Hi, you need to remove the trailing numbers from the gene IDs. Also, the value for filters should be ensembl_gene_id. Please try this:

library(biomaRt)
ensembl <- useMart('ensembl', dataset = 'mmusculus_gene_ensembl')
genes_ids <- sub('\\.[0-9]*$', '',
  c('ENSMUSG00000051951.5', 'ENSMUSG00000025900.12', 'ENSMUSG00000025902.13'))
gs_heatdata <- getBM(
  attributes = c('external_gene_name', 'mgi_symbol','ensembl_gene_id'),
  filters = 'ensembl_gene_id',
  values = genes_ids,
  mart = ensembl)

gs_heatdata
  external_gene_name mgi_symbol    ensembl_gene_id
1                Rp1        Rp1 ENSMUSG00000025900
2              Sox17      Sox17 ENSMUSG00000025902
3               Xkr4       Xkr4 ENSMUSG00000051951
ADD REPLYlink written 20 hours ago by Kevin Blighe71k

it works perfectly but I did not understand how you managed it: - the trailing number stands for the 0s before the actual id? - could you explain me in particular what sub('\\.[0-9]*$', '', refers to? thank you a lot!

ADD REPLYlink written 19 hours ago by tommaso.gastaldi0
2

That is a regular expression saying that substitute anything including a period and any number(s) between 1 and 9 with nothing (i.e. delete).

ADD REPLYlink modified 19 hours ago • written 19 hours ago by GenoMax96k

sorry I forgot one more question. How can I make the code "cleaner"? because the output in the end shows me two features that are the same, the 'external_gene_name' and 'mgi_symbol'.

Thank you!

ADD REPLYlink written 19 hours ago by tommaso.gastaldi0
2

Change following line

attributes = c('external_gene_name', 'mgi_symbol','ensembl_gene_id')

to

attributes = c('external_gene_name', 'ensembl_gene_id')

Or keep mgi_symbol if you want to keep that instead.

ADD REPLYlink modified 19 hours ago • written 19 hours ago by GenoMax96k

I tried with my all dataset but it did not work. I just have in return the empty table with the external_gene_name and ensembl_gene_id as headers.

library(biomaRt)
ensembl <- useMart("ensembl",dataset="mmusculus_gene_ensembl")
genes_ids <- sub('\\.[0-9]*$', '',  row.names(heatdata))
gs_heatdata <- getBM(attributes = c('external_gene_name', 'ensembl_gene_id'), 
                 filters = "mgi_symbol",
                 values = genes_ids,
                 mart = ensembl)

head(heatdata)
                      T0medium T0medium T0medium    T0LAL    T0LAL    T0LAL    6hLAL    6hLAL    6hLAL   6hIMQ
ENSMUSG00000051951.5  0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
ENSMUSG00000025900.12 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
ENSMUSG00000025902.13 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
ENSMUSG00000033845.13 8.635869 8.717134 8.644194 8.688051 8.729801 8.719839 8.522753 8.451425 8.588430 8.93282
ENSMUSG00000025903.14 9.244627 9.269090 9.357344 9.148911 9.297785 9.352155 9.265217 9.099127 9.255727 9.28542
ENSMUSG00000104217.1  0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
                         6hIMQ    6hIMQ   16hLAL   16hLAL   16hLAL   16hIMQ   16hIMQ   16hIMQ
ENSMUSG00000051951.5  0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000025900.12 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000025902.13 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000033845.13 8.838776 8.843039 8.541431 8.565437 8.534634 9.114412 9.122216 9.117485
ENSMUSG00000025903.14 9.392362 9.217806 9.207043 9.377954 9.266217 9.221498 9.238627 9.220453
ENSMUSG00000104217.1  0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ADD REPLYlink modified 17 hours ago • written 17 hours ago by tommaso.gastaldi0

Hi, the converted IDs are contained in gs_heatdata. You then have to align these to the rownames of heatdata, and then replace them with the external gene IDs (MGI symbols).

ADD REPLYlink written 16 hours ago by Kevin Blighe71k

Hi, how can I align them? which function should I use? how can I then replace them with the external gene IDs? should I first convert the row.names of heatdata in the first column and then somehow combine the df gs_heatdata with the df heatdata? thank you a lot! :)

ADD REPLYlink modified 3 hours ago • written 3 hours ago by tommaso.gastaldi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2258 users visited in the last hour
_