biomaRt: Timeout on getBM().
1
0
Entering edit mode
6 weeks ago
jon.klonowski ▴ 120

My Biomart getBM() command is timing out, and I do not know why.

Failed <- getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id", "transcript_tsl"), mart = ensembl)
Error in curl::curl_fetch_memory(url, handle = handle) : Timeout was reached: [dec2021.archive.ensembl.org:443] Operation timed out after 300000 milliseconds with 9960752 bytes received

to compare,

rawr <- getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id", "transcript_tsl"), mart = ensembl)

works fine

How i get my mart:

library(biomaRt)
mart=useMart("ensembl", host = "https://dec2021.archive.ensembl.org")
ensembl = useDataset("hsapiens_gene_ensembl", mart = mart)
ensembl biomart • 482 views
ADD COMMENT
0
Entering edit mode

Any reason in particular you are using https://dec2021.archive.ensembl.org?

ADD REPLY
0
Entering edit mode

My genomic variant annotation was done with ensembl v 105 so I am keeping all my versions consistent

ADD REPLY
1
Entering edit mode

You can specify the ensembl version directly with the version argument.

ensembl <- useEnsembl("genes", "hsapiens_gene_ensembl", version=105)
ADD REPLY
3
Entering edit mode
6 weeks ago
Mike Smith ★ 1.9k

The issue here is that you're essentially doing a bulk data download of annotation for the entire genome. The Ensembl BioMart service isn't really designed for that, it's more aimed at asking for additional data points on a "small" set of genes or transcripts. Hence you hit a timeout limit when asking for too much information. I can't see a difference between the query that works and that which fails, but I guess the working implementation was querying the current version of Ensembl rather than an archive. I suspect it works because you get slightly better performance out of the main site and it manages to return you a result before the 5 minute limit is reached.

If you really want whole genome data you're probably better off trying to download the annotation from the Ensembl FTP (http://www.ensembl.org/info/data/ftp/index.html/) and working with those files locally or using a genome annotation package for example ensembldb.

That said, you can "trick" biomaRt into helping with this by first asking for all possible gene ids. Then provide these as a filter and biomaRt will break your query down into several smaller parts, each of which works within the timelimit, and then stitches the results back into a single table for you e.g.

library(biomaRt)
ensembl <- useEnsembl("genes", "hsapiens_gene_ensembl", version=105)

gene_ids <- getBM(attributes = c("ensembl_gene_id"), mart = ensembl)
all_data <- getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id", "transcript_tsl"), 
      filters = "ensembl_gene_id", 
      values = gene_ids, 
      mart = ensembl)

head(all_data)
#>   ensembl_transcript_id ensembl_gene_id                        transcript_tsl
#> 1       ENST00000469599 ENSG00000012817                                  tsl2
#> 2       ENST00000317961 ENSG00000012817 tsl1 (assigned to previous version 8)
#> 3       ENST00000541639 ENSG00000012817                                  tsl1
#> 4       ENST00000382806 ENSG00000012817                                  tsl1
#> 5       ENST00000492117 ENSG00000012817                                  tsl2
#> 6       ENST00000440077 ENSG00000012817                                  tsl5
dim(all_data)
#> [1] 266615      3
ADD COMMENT
0
Entering edit mode

Thank you so much. I tried this way and using an EnsDB.

ADD REPLY

Login before adding your answer.

Traffic: 915 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6