Question: BioMart : the BioMart webservice returned an invalid result
0
gravatar for amandinelecerfdefer
3 months ago by
amandinelecerfdefer20 wrote:

Hello,

Thanks to a file containing a list of rsIDs, I want to retrieve the name of the gene and transcripts corresponding to each rsID. tool :

install.packages('BiocManager', repos='http://cran.us.r-project.org')
BiocManager::install(c("biomaRt"))

library(biomaRt)
Data <- read.delim("/Users/amandinelecerfdefer/Desktop/Modification_vcf/cut/rsID_origine.txt2.txt")

snpmart <-
  useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
T1<-Sys.time()
T1
res <- getBM(
  attributes = c(
    "refsnp_id",
    "ensembl_gene_stable_id",
    "ensembl_transcript_stable_id"
  ),
  filters = "snp_filter",
  values = Data$rsID,
  mart = snpmart,
  uniqueRows = TRUE
)

T2<-Sys.time()
T2
write.csv(res, file = "/Users/amandinelecerfdefer/Desktop/Modification_vcf/name_cut/recovery_gene_trans_original2.txt")
Tdiff= difftime(T2, T1) 
Tdiff
write.csv(Tdiff, file = "/Users/amandinelecerfdefer/Desktop/Modification_vcf/time/time2.txt")`enter code here`

Last week this tool worked very well but for a few days now, it has been impossible to launch it due to a recurring error.

I have this error :

> res <- getBM(
+   attributes = c(
+     "refsnp_id",
+     "ensembl_gene_stable_id",
+     "ensembl_transcript_stable_id"
+   ),
+   filters = "snp_filter",
+   values = Data$rsID,
+   mart = snpmart,
+   uniqueRows = TRUE
+ )
Batch submitting query [=======>-----------------------------------------------------]  13% eta:  2hError in getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id", "ensembl_transcript_stable_id"),  : 
  The query to the BioMart webservice returned an invalid result: biomaRt expected a character string of length 1. 
Please report this on the support site at http://support.bioconductor.org

How to fix this error and make the tool work?

thank you

snp biomart • 405 views
ADD COMMENTlink modified 10 weeks ago by Biostar ♦♦ 20 • written 3 months ago by amandinelecerfdefer20
1
gravatar for Emily_Ensembl
3 months ago by
Emily_Ensembl19k
EMBL-EBI
Emily_Ensembl19k wrote:

I just ran your query with a couple of random rsIDs as values and had no problems. Can you give us a sample of your data? How long is your list of values?

ADD COMMENTlink written 3 months ago by Emily_Ensembl19k

Hi, Basically, my file is 17 million lines in size. Having had this error, I thought I would cut this file into sub-files that will have a size of 100,000 lines. Example of a part of a file:

rsID
rs142849724
rs141989890
rs193023236
rs187050627
rs115405973
rs542587725
rs140068063
rs185528550
rs539019715
rs571562101
rs190704807
rs571549807
rs143117458
rs115290438
rs114653362
rs190493256
rs192232546
rs139049437
rs186328231
rs189269980
rs530558338
rs568408968
rs377289156
rs116019130
rs190479833
ADD REPLYlink written 3 months ago by amandinelecerfdefer20
1

You can't use BioMart with a file 17 million lines long. You could use our APIs or you could parse the data out of the VCF files with consequences.

ADD REPLYlink written 3 months ago by Emily_Ensembl19k

I suspect I can't do that with a 17 million line file but I tried it with 100,000 a few days ago and it was working but not anymore

ADD REPLYlink written 3 months ago by amandinelecerfdefer20

You can't use it for 100,000 either. We recommend a maximum of 500.

ADD REPLYlink written 3 months ago by Emily_Ensembl19k

It's strange, but I did it once with 100,000 lines. Thank you for your answer, so I will divide my 17 million line file into 500 line files to find the matches. Thank you. Thank you.

ADD REPLYlink written 3 months ago by amandinelecerfdefer20
2

Please don't do that either. You will jam up our servers. I recommend parsing the VCFs.

ADD REPLYlink written 3 months ago by Emily_Ensembl19k

No problem, I will find an other solution.

ADD REPLYlink written 3 months ago by amandinelecerfdefer20

From the previous response of Mike Smith in vector dimension limit in biomaRt, it seems that there's already an internal function to do the batch work?

I've modified the getBM() function in biomaRt to submit queries in batches if the number of values exceeds 500. If you have multiple filters each of which have more than 500 values it should generate multiple mutually exclusive queries so that all combinations are run without breaking the 500 value limit. All of this is done internally, so existing biomaRt scripts shouldn't need to be changed. It will also display a progress bar so you can tell it is still proceeding. This is available from biomaRt version 2.33.1

ADD REPLYlink written 3 months ago by SMK1.8k

I modified my request with the information given in the post. But a new memory error appears:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: Connection time-out
Calls: useMart ... request_fetch -> request_fetch.write_memory -> <Anonymous>
Exécution arrêtée

Biomart version : 2.40.0

ADD REPLYlink written 3 months ago by amandinelecerfdefer20

Seriously, please don't.

ADD REPLYlink written 3 months ago by Emily_Ensembl19k

I only made a request for 20,000 rsID because, as Mike says, he expanded the research capacity. I have only requested a single file of 20,000 lines without making any loops, I test Mike's update.

ADD REPLYlink modified 3 months ago • written 3 months ago by amandinelecerfdefer20
3

I told you another way.
Parse the VCF
Use the APIs
Use the VEP

Don't blame me when your IP address gets blocked for clogging up our servers.

ADD REPLYlink modified 3 months ago • written 3 months ago by Emily_Ensembl19k

Thank you for your suggestions, I will explore these tools to find a more suitable one and thus avoid overloading the server.

ADD REPLYlink written 3 months ago by amandinelecerfdefer20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1252 users visited in the last hour