Question: biomaRt mmusculus_gene_ensembl dataset
2
gravatar for igor
14 months ago by
igor7.3k
United States
igor7.3k wrote:

For several years, I used the following code to get mouse gene annotations using the biomaRt Bioconductor package:

mart = useMart(host="useast.ensembl.org", biomart="ENSEMBL_MART_ENSEMBL", dataset="mmusculus_gene_ensembl", verbose=F)

Recently, it stopped working with the following error:

Error in checkDataset(dataset = dataset, mart = mart) : 
  The given dataset:  mmusculus_gene_ensembl , is not valid.  Correct dataset names can be obtained with the listDatasets() function.

I followed the advice and ran listDatasets(). Not surprisingly, mmusculus_gene_ensembl is not there. Other common species (such as fly and rat) are still there. What happened to the mouse dataset? I tried searching to see if there was some announcement, but couldn't find any mention.

biomart • 3.0k views
ADD COMMENTlink modified 14 months ago by Mike Smith1.1k • written 14 months ago by igor7.3k
1

I am getting the same error when using hsapiens_gene_ensembl. Any help would be appreciated.

ADD REPLYlink written 14 months ago by a.khadija10

I followed the advice and ran listDatasets(). Not surprisingly, mmusculus_gene_ensembl is not there.

It seems to be there currently :

ensembl=useMart("ENSEMBL_MART_ENSEMBL")
listDatasets(ensembl)[1]

dataset
1 amelanoleuca_gene_ensembl
2 dordii_gene_ensembl
...
25 ecaballus_gene_ensembl
26 mmusculus_gene_ensembl
27 oanatinus_gene_ensembl
...
33 sboliviensis_gene_ensembl

Another post is referring to this : bioconductor

ADD REPLYlink modified 14 months ago • written 14 months ago by erwan.scaon630

I just tried this:

ensembl = useMart("ENSEMBL_MART_ENSEMBL")
dim(listDatasets(ensembl))
dim(listDatasets(ensembl))
dim(listDatasets(ensembl))

The output is different every time, so you don't always get the same species. Very odd.

ADD REPLYlink modified 14 months ago • written 14 months ago by igor7.3k
6
gravatar for Mike Smith
14 months ago by
Mike Smith1.1k
EMBL Heidelberg / de.NBI
Mike Smith1.1k wrote:

This issue was caused by the introduction of the new primate species, which include apostrophes in their description fields (e.g. aotus nancymaae (Nancy Ma's night monkey) ), and biomaRt was unable to process these correctly.

This has now been patched, and if you update your version of biomaRt using BiocInstaller::biocLite('biomaRt') you should get the expected behaviour again. You will need version 2.34.1 or greater to work with Ensembl 91.

> mart = useMart(host="useast.ensembl.org", 
                 biomart="ENSEMBL_MART_ENSEMBL", 
                 dataset="mmusculus_gene_ensembl")
> mart
Object of class 'Mart':
  Using the ENSEMBL_MART_ENSEMBL BioMart database
  Using the mmusculus_gene_ensembl dataset

> packageVersion('biomaRt')
[1] ‘2.34.1’
ADD COMMENTlink modified 14 months ago • written 14 months ago by Mike Smith1.1k

It's odd that listDatasets() returns a different number of datasets every time you call it. Shouldn't it run into the error in the same place every time?

ADD REPLYlink written 14 months ago by igor7.3k

listDatasets() just grabs it's information from http://www.ensembl.org/biomart/martservice?type=datasets&mart=ENSEMBL_MART_ENSEMBL (or the equivalent for other marts). If you refresh that page you'll get the list in different orders.

Hence the entries containing the apostrophes appeared in different places with each query. You were getting back all the entries that appeared before the first apostrophe, and so it broke in a slightly different place each time.

ADD REPLYlink written 14 months ago by Mike Smith1.1k

Thanks for clarifying. Do you know why that is in a random order? Is it not just a MySQL query on the backend (should preserve order every time unless randomized on purpose)?

Sorry, this is unrelated to the original question. I am just curious.

ADD REPLYlink written 14 months ago by igor7.3k
1

I'm afraid I don't know enough about the BioMart internals to answer that, I just work with the API. It definitely added confusion to this issue though.

ADD REPLYlink written 14 months ago by Mike Smith1.1k

No problem. It just seems like a bizarre behavior (unlike having a uncommon character break code, which makes a lot of sense). Was just curious what happened.

ADD REPLYlink written 14 months ago by igor7.3k
2
gravatar for erwan.scaon
14 months ago by
erwan.scaon630
Limoges - CBRS - France
erwan.scaon630 wrote:

I'd assume that Mouse C57BL/6NJ is closest to the previous mmusculus

Be cautious, if you need Ensembl gene / transcript identifiers (such as ENSG.., ENST...), the "Mouse C57BL/6NJ genes" dataset will not do the trick :

library('biomaRt')
mc57bl6nj = useMart("ENSEMBL_MART_MOUSE",
                dataset="mc57bl6nj_gene_ensembl")

mc57bl6nj_infos <- getBM(attributes=c('ensembl_transcript_id',
                                  'ensembl_gene_id',
                                  'external_gene_name'),
                     mart = mc57bl6nj)

head(mc57bl6nj_infos)

ensembl_transcript_id ensembl_gene_id external_gene_name 1
MGP_C57BL6NJ_T0004927 MGP_C57BL6NJ_G0004247 A930041C12Rik 2
MGP_C57BL6NJ_T0088905 MGP_C57BL6NJ_G0033948 Polb 3
MGP_C57BL6NJ_T0062354 MGP_C57BL6NJ_G0027864 Chrnb2 4

This info is in the "mmusculus_gene_ensembl" dataset, which is somewhat difficult to query atm (the exact same command will either fail or work ...) :

> mmusculus = useMart("ENSEMBL_MART_ENSEMBL",
+ dataset="mmusculus_gene_ensembl")
Error in checkDataset(dataset = dataset, mart = mart) : 
  The given dataset:  mmusculus_gene_ensembl , is not valid.  Correct dataset names can be obtained with the listDatasets() function.
> mmusculus = useMart("ENSEMBL_MART_ENSEMBL",
+ dataset="mmusculus_gene_ensembl")
Error in checkDataset(dataset = dataset, mart = mart) : 
  The given dataset:  mmusculus_gene_ensembl , is not valid.  Correct dataset names can be obtained with the listDatasets() function.
> mmusculus = useMart("ENSEMBL_MART_ENSEMBL",
+ dataset="mmusculus_gene_ensembl")

When it's working :

mmusculus = useMart("ENSEMBL_MART_ENSEMBL",
                    dataset="mmusculus_gene_ensembl")

mmusculus_infos <- getBM(attributes=c('ensembl_transcript_id',
                                      'ensembl_gene_id',
                                      'external_gene_name'),
                         mart = mmusculus)

head(mmusculus_infos)

ensembl_transcript_id ensembl_gene_id external_gene_name
ENSMUST00000082423 ENSMUSG00000064372 mt-Tp
ENSMUST00000082422 ENSMUSG00000064371 mt-Tt
ENSMUST00000082421 ENSMUSG00000064370 mt-Cytb

I hope this get fixed soon

ADD COMMENTlink modified 14 months ago • written 14 months ago by erwan.scaon630
1
gravatar for Neilfws
14 months ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

The same is true of the Ensembl BioMart website and of other hosts e.g. www.ensembl.org.

It appears that in the latest build 91, mouse data has been moved to "Mouse strains 91", or ENSEMBL_MART_MOUSE:

listDatasets(useMart(host = "www.ensembl.org", biomart = "ENSEMBL_MART_MOUSE"))

                    dataset                              description        version
1     mcasteij_gene_ensembl       Mouse CAST/EiJ genes (CAST_EiJ_v1)    CAST_EiJ_v1
2   mnodshiltj_gene_ensembl   Mouse NOD/ShiLtJ genes (NOD_ShiLtJ_v1)  NOD_ShiLtJ_v1
3        makrj_gene_ensembl             Mouse AKR/J genes (AKR_J_v1)       AKR_J_v1
4      mbalbcj_gene_ensembl         Mouse BALB/cJ genes (BALB_cJ_v1)     BALB_cJ_v1
5    mnzohlltj_gene_ensembl     Mouse NZO/HlLtJ genes (NZO_HlLtJ_v1)   NZO_HlLtJ_v1
6  m129s1svimj_gene_ensembl Mouse 129S1/SvImJ genes (129S1_SvImJ_v1) 129S1_SvImJ_v1
7       mfvbnj_gene_ensembl           Mouse FVB/NJ genes (FVB_NJ_v1)      FVB_NJ_v1
8         mlpj_gene_ensembl               Mouse LP/J genes (LP_J_v1)        LP_J_v1
9       mdba2j_gene_ensembl           Mouse DBA/2J genes (DBA_2J_v1)      DBA_2J_v1 
10     mc3hhej_gene_ensembl         Mouse C3H/HeJ genes (C3H_HeJ_v1)     C3H_HeJ_v1
11     mwsbeij_gene_ensembl         Mouse WSB/EiJ genes (WSB_EiJ_v1)     WSB_EiJ_v1
12         maj_gene_ensembl                 Mouse A/J genes (A_J_v1)         A_J_v1
13     mpwkphj_gene_ensembl         Mouse PWK/PhJ genes (PWK_PhJ_v1)     PWK_PhJ_v1
14   mc57bl6nj_gene_ensembl     Mouse C57BL/6NJ genes (C57BL_6NJ_v1)   C57BL_6NJ_v1
15       mcbaj_gene_ensembl             Mouse CBA/J genes (CBA_J_v1)       CBA_J_v1

I'd assume that Mouse C57BL/6NJ is closest to the previous mmusculus.

You could also use the Ensembl 90 archive to access the old mmusculus_gene_ensembl:

useMart(host = "aug2017.archive.ensembl.org", biomart = "ENSEMBL_MART_ENSEMBL")
ADD COMMENTlink modified 14 months ago • written 14 months ago by Neilfws48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 852 users visited in the last hour