Question: BioMart : the BioMart webservice returned an invalid result
0
gravatar for amandinelecerfdefer
3 days ago by
amandinelecerfdefer0 wrote:

Hello,

Thanks to a file containing a list of rsIDs, I want to retrieve the name of the gene and transcripts corresponding to each rsID. tool :

install.packages('BiocManager', repos='http://cran.us.r-project.org')
BiocManager::install(c("biomaRt"))

library(biomaRt)
Data <- read.delim("/Users/amandinelecerfdefer/Desktop/Modification_vcf/cut/rsID_origine.txt2.txt")

snpmart <-
  useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
T1<-Sys.time()
T1
res <- getBM(
  attributes = c(
    "refsnp_id",
    "ensembl_gene_stable_id",
    "ensembl_transcript_stable_id"
  ),
  filters = "snp_filter",
  values = Data$rsID,
  mart = snpmart,
  uniqueRows = TRUE
)

T2<-Sys.time()
T2
write.csv(res, file = "/Users/amandinelecerfdefer/Desktop/Modification_vcf/name_cut/recovery_gene_trans_original2.txt")
Tdiff= difftime(T2, T1) 
Tdiff
write.csv(Tdiff, file = "/Users/amandinelecerfdefer/Desktop/Modification_vcf/time/time2.txt")`enter code here`

Last week this tool worked very well but for a few days now, it has been impossible to launch it due to a recurring error.

I have this error :

> res <- getBM(
+   attributes = c(
+     "refsnp_id",
+     "ensembl_gene_stable_id",
+     "ensembl_transcript_stable_id"
+   ),
+   filters = "snp_filter",
+   values = Data$rsID,
+   mart = snpmart,
+   uniqueRows = TRUE
+ )
Batch submitting query [=======>-----------------------------------------------------]  13% eta:  2hError in getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id", "ensembl_transcript_stable_id"),  : 
  The query to the BioMart webservice returned an invalid result: biomaRt expected a character string of length 1. 
Please report this on the support site at http://support.bioconductor.org

How to fix this error and make the tool work?

thank you

snp biomart • 135 views
ADD COMMENTlink written 3 days ago by amandinelecerfdefer0

I just ran your query with a couple of random rsIDs as values and had no problems. Can you give us a sample of your data? How long is your list of values?

ADD REPLYlink written 3 days ago by Emily_Ensembl18k

Hi, Basically, my file is 17 million lines in size. Having had this error, I thought I would cut this file into sub-files that will have a size of 100,000 lines. Example of a part of a file:

rsID
rs142849724
rs141989890
rs193023236
rs187050627
rs115405973
rs542587725
rs140068063
rs185528550
rs539019715
rs571562101
rs190704807
rs571549807
rs143117458
rs115290438
rs114653362
rs190493256
rs192232546
rs139049437
rs186328231
rs189269980
rs530558338
rs568408968
rs377289156
rs116019130
rs190479833
ADD REPLYlink written 3 days ago by amandinelecerfdefer0
1

You can't use BioMart with a file 17 million lines long. You could use our APIs or you could parse the data out of the VCF files with consequences.

ADD REPLYlink written 3 days ago by Emily_Ensembl18k

I suspect I can't do that with a 17 million line file but I tried it with 100,000 a few days ago and it was working but not anymore

ADD REPLYlink written 3 days ago by amandinelecerfdefer0

You can't use it for 100,000 either. We recommend a maximum of 500.

ADD REPLYlink written 3 days ago by Emily_Ensembl18k

It's strange, but I did it once with 100,000 lines. Thank you for your answer, so I will divide my 17 million line file into 500 line files to find the matches. Thank you. Thank you.

ADD REPLYlink written 3 days ago by amandinelecerfdefer0
2

Please don't do that either. You will jam up our servers. I recommend parsing the VCFs.

ADD REPLYlink written 3 days ago by Emily_Ensembl18k

No problem, I will find an other solution.

ADD REPLYlink written 3 days ago by amandinelecerfdefer0

From the previous response of Mike Smith in vector dimension limit in biomaRt, it seems that there's already an internal function to do the batch work?

I've modified the getBM() function in biomaRt to submit queries in batches if the number of values exceeds 500. If you have multiple filters each of which have more than 500 values it should generate multiple mutually exclusive queries so that all combinations are run without breaking the 500 value limit. All of this is done internally, so existing biomaRt scripts shouldn't need to be changed. It will also display a progress bar so you can tell it is still proceeding. This is available from biomaRt version 2.33.1

ADD REPLYlink written 3 days ago by SMK1.3k

I modified my request with the information given in the post. But a new memory error appears:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: Connection time-out
Calls: useMart ... request_fetch -> request_fetch.write_memory -> <Anonymous>
Exécution arrêtée

Biomart version : 2.40.0

ADD REPLYlink written 2 days ago by amandinelecerfdefer0

Seriously, please don't.

ADD REPLYlink written 2 days ago by Emily_Ensembl18k

I only made a request for 20,000 rsID because, as Mike says, he expanded the research capacity. I have only requested a single file of 20,000 lines without making any loops, I test Mike's update.

ADD REPLYlink modified 2 days ago • written 2 days ago by amandinelecerfdefer0
2

I told you another way.
Parse the VCF
Use the APIs
Use the VEP

Don't blame me when your IP address gets blocked for clogging up our servers.

ADD REPLYlink modified 2 days ago • written 2 days ago by Emily_Ensembl18k

Thank you for your suggestions, I will explore these tools to find a more suitable one and thus avoid overloading the server.

ADD REPLYlink written 2 days ago by amandinelecerfdefer0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 561 users visited in the last hour