Get gene symbols from gene ids for mouse using BioMart
1
3
Entering edit mode
5.1 years ago

I am trying to get gene symbols for gene ids that I got for mouse datasets. Gene ids look like that: 0610009B22Rik. The code that I am trying to utilize is the following one:

ensembl <- useMart("ensembl", dataset="mmusculus_gene_ensembl")
mouse_gene_ids <- dataset[, 1]
foo <- getBM(attributes=c('ensembl_gene_id',
'external_gene_name'),
filters = 'genedb',
values = mouse_gene_ids,
mart = ensembl)


I am getting zero results as an output after the query runs. I guess filters parameter is wrong. Any suggestions would be greatly appreciated.

BioMart mouse gene ids gene symbols • 17k views
9
Entering edit mode
5.1 years ago
Mike Smith ★ 2.0k

The filter you need is mgi_symbol e.g.

library(biomaRt)

ensembl <- useMart("ensembl", dataset="mmusculus_gene_ensembl")
mouse_gene_ids  <- "0610009B22Rik"

foo <- getBM(attributes=c('ensembl_gene_id',
'external_gene_name'),
filters = 'mgi_symbol',
values = mouse_gene_ids,
mart = ensembl)


Here's the result:

> foo
ensembl_gene_id external_gene_name
1 ENSMUSG00000007777      0610009B22Rik


I find the best way to choose the correct filter is to start with the Ensembl BioMart web interface, use the examples in the Filters -> external references ID list dropdown list to find the format I'm using, and then hit the XML button near the top. This will let you see the filter name required by biomaRt

0
Entering edit mode
library(biomaRt)
listMarts()
ensembl=useMart("ensembl")
datasets <- listDatasets(ensembl)
ensembl = useDataset("mmusculus_gene_ensembl", mart = ensembl)
entrzID=c("14455", "80904", "94275")
filters = listFilters(ensembl)
filters[1:50,]
getBM(attributes = c("ensembl_gene_id", "external_gene_name"), filters = "mgi_symbol", values = entrzID, mart = ensembl)


output:

[1] ensembl_gene_id    external_gene_name
<0 rows> (or 0-length row.names)


Why can't I get any gene symbol

1
Entering edit mode

Perhaps you want to try the entrezgene_id filter instead?

getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
filters = "entrezgene_id",
values = entrzID,
mart = ensembl)

ensembl_gene_id external_gene_name
1 ENSMUSG00000040415               Dtx3
2 ENSMUSG00000025151             Maged1

0
Entering edit mode

Yes. I tried it and it works.

Thank you so much for the help!

0
Entering edit mode

Hello, I tried to follow the previous posts and actually everything worked but I did not get anything back as result. My code below: library(biomaRt) ensembl <- useMart("ensembl",dataset="mmusculus_gene_ensembl") genes_ids <- c('ENSMUSG00000051951.5', 'ENSMUSG00000025900.12', 'ENSMUSG00000025902.13') gs_heatdata <- getBM(attributes = c("external_gene_name"), filters = "mgi_symbol", values = genes_ids, mart = ensembl)

1
Entering edit mode

Hi, you need to remove the trailing numbers from the gene IDs. Also, the value for filters should be ensembl_gene_id. Please try this:

library(biomaRt)
ensembl <- useMart('ensembl', dataset = 'mmusculus_gene_ensembl')
genes_ids <- sub('\\.[0-9]*$', '', c('ENSMUSG00000051951.5', 'ENSMUSG00000025900.12', 'ENSMUSG00000025902.13')) gs_heatdata <- getBM( attributes = c('external_gene_name', 'mgi_symbol','ensembl_gene_id'), filters = 'ensembl_gene_id', values = genes_ids, mart = ensembl) gs_heatdata external_gene_name mgi_symbol ensembl_gene_id 1 Rp1 Rp1 ENSMUSG00000025900 2 Sox17 Sox17 ENSMUSG00000025902 3 Xkr4 Xkr4 ENSMUSG00000051951  ADD REPLY 0 Entering edit mode it works perfectly but I did not understand how you managed it: - the trailing number stands for the 0s before the actual id? - could you explain me in particular what sub('\\.[0-9]*$', '', refers to? thank you a lot!

2
Entering edit mode

That is a regular expression saying that substitute anything including a period and any number(s) between 1 and 9 with nothing (i.e. delete).

0
Entering edit mode

sorry I forgot one more question. How can I make the code "cleaner"? because the output in the end shows me two features that are the same, the 'external_gene_name' and 'mgi_symbol'.

Thank you!

2
Entering edit mode

Change following line

attributes = c('external_gene_name', 'mgi_symbol','ensembl_gene_id')


to

attributes = c('external_gene_name', 'ensembl_gene_id')


Or keep mgi_symbol if you want to keep that instead.

0
Entering edit mode

I tried with my all dataset but it did not work. I just have in return the empty table with the external_gene_name and ensembl_gene_id as headers.

library(biomaRt)
ensembl <- useMart("ensembl",dataset="mmusculus_gene_ensembl")
genes_ids <- sub('\\.[0-9]*', '', row.names(heatdata)) gs_heatdata <- getBM(attributes = c('external_gene_name', 'ensembl_gene_id'), filters = "mgi_symbol", values = genes_ids, mart = ensembl) head(heatdata) T0medium T0medium T0medium T0LAL T0LAL T0LAL 6hLAL 6hLAL 6hLAL 6hIMQ ENSMUSG00000051951.5 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 ENSMUSG00000025900.12 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 ENSMUSG00000025902.13 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 ENSMUSG00000033845.13 8.635869 8.717134 8.644194 8.688051 8.729801 8.719839 8.522753 8.451425 8.588430 8.93282 ENSMUSG00000025903.14 9.244627 9.269090 9.357344 9.148911 9.297785 9.352155 9.265217 9.099127 9.255727 9.28542 ENSMUSG00000104217.1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 6hIMQ 6hIMQ 16hLAL 16hLAL 16hLAL 16hIMQ 16hIMQ 16hIMQ ENSMUSG00000051951.5 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ENSMUSG00000025900.12 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ENSMUSG00000025902.13 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ENSMUSG00000033845.13 8.838776 8.843039 8.541431 8.565437 8.534634 9.114412 9.122216 9.117485 ENSMUSG00000025903.14 9.392362 9.217806 9.207043 9.377954 9.266217 9.221498 9.238627 9.220453 ENSMUSG00000104217.1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000  ADD REPLY 0 Entering edit mode Hi, the converted IDs are contained in gs_heatdata. You then have to align these to the rownames of heatdata, and then replace them with the external gene IDs (MGI symbols). ADD REPLY 0 Entering edit mode Hi, how can I align them? which function should I use? how can I then replace them with the external gene IDs? should I first convert the row.names of heatdata in the first column and then somehow combine the df gs_heatdata with the df heatdata? thank you a lot! :) ADD REPLY 0 Entering edit mode Hi, please take a look at functions such as which() and match(), and other functions from dplyr (package) for matching data-frames. A quick example: array1 <- c('a','b','c','d','e','f','g') array2 <- c('e','f','g','a','b','c','d') idx <- match(array1, array2) data.frame(array1 = array1, array2 = array2[idx]) array1 array2 1 a a 2 b b 3 c c 4 d d 5 e e 6 f f 7 g g  ADD REPLY 0 Entering edit mode Hi, I tried for now with match() but I think it did not work. matched_heatdata <- match(gs_heatdata, heatdata) matched_heatdata [1] NA NA  ADD REPLY 0 Entering edit mode match() returns the indices [in heatdata] of the elements of gs_heatdata What you likely need is: idx <- match( sub('\\.[0-9]*', '', rownames(heatdata)),
gs_heatdata$ensembl_gene_id) gs_heatdata <- gs_heatdata[idx,] all(sub('\\.[0-9]*$', '', rownames(heatdata)) == gs_heatdata$ensembl_gene_id) # must return TRUE  ADD REPLY 0 Entering edit mode ok, I try this. Just for me to understand: can I also just use the previous genes_ids or I have to put the entire sub('\\.[0-9]*$', '', rownames(heatdata)) in match() and after all()? thank you!!

0
Entering edit mode

It returned this:

idx <- match(sub('\\.[0-9]*$', '', rownames(heatdata)),gs_heatdata$ensembl_gene_id)
gs_heatdata <- gs_heatdata[idx,]
all(sub('\\.[0-9]*$', '', rownames(heatdata)) == gs_heatdata$ensembl_gene_id)
[1] NA

1
Entering edit mode

I think I found a problem and it was quite in front of me. the filters set were wrong. I had to use filters = "ensembl_gene_id" instead of filters = "mgi_symbol". now the gs_heatdata looks good:

external_gene_name    ensembl_gene_id
3079               Xkr4 ENSMUSG00000051951
424                 Rp1 ENSMUSG00000025900
425               Sox17 ENSMUSG00000025902
1951             Mrpl15 ENSMUSG00000033845
426              Lypla1 ENSMUSG00000025903
4321            Gm37988 ENSMUSG00000104217


but if I proceed with the previous code I get anyway NA:

idx <- match(sub('\\.[0-9]*$', '', rownames(heatdata)), gs_heatdata$ensembl_gene_id)
gs_heatdata <- gs_heatdata[idx,]
all(sub('\\.[0-9]*$', '', rownames(heatdata)) == gs_heatdata$ensembl_gene_id)
[1] NA