Question: rentrez fetch error - unable to obtain query
0
gravatar for klrdna
13 months ago by
klrdna0
klrdna0 wrote:

Hello everyone,

I'm having a problem with the R package rentrez that I am hoping somebody might be able to help me with.

I am trying to download all mitochondrial COI sequences from Genbank in .gb format, using entrez_search followed by entrez_fetch. The number of records returned by entrez_search is quite large (> 2 million), so I have been using a for loop in rentrez_fetch to download the sequences in smaller chunks. I used this approach some months ago for another gene (mitochondrial 12S rRNA; ~200,000 gb entries) and it worked, but now I keep getting output files that don't contain any sequence entries and instead have the following message (one error message per for loop itineration):

 https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd"> <eFetchResult>   <ERROR>Unable to obtain query #1</ERROR> </eFetchResult>

I've tried this a number of times over the past few days and it keeps happening. Sometimes the download appears to go ok for a number of itinerations of the for loop, but then it starts throwing the same error. Other times I only see the error message and no sequences at all. I have tried doing the snail coi example in the rentrez tutorial and it seemed to work ok. I also repeated the script I used for the 12S gene - the first 20,000 entries downloaded ok, but then the same error message started appearing. I have tried using sys.sleep to create a pause between loops, to lessen the demand on the NCBI server and reducing the chunks to 500 sequences, but I still get the same error. I've also been running the commands after 9pm US eastern time, as per the NCBI guidelines. My R script is:

mitochondrial_COI_seqs_170118<- entrez_search(db="nuccore", term="(((((((((COX1) OR cytochrome c oxidase I) OR COI) OR MT-CO1) OR MTCO1) OR CO I) OR mitochondrially encoded cytochrome c oxidase I) OR main subunit of cytochrome c oxidase) OR cytochrome c oxidase subunit I) AND Mitochondrion[Filter]", use_history=TRUE)
mitochondrial_COI_seqs_170118

for( seq_start in seq(1,2692598,10000)){recs <- entrez_fetch(db="nuccore", web_history=mitochondrial_COI_seqs_170118$web_history,
                       rettype="gb", retmax=10000, retstart=seq_start)
  # if first in sequence create new file otherwise append to file
  if(seq_start==1){
    cat(recs, file="mitochondrial_COI_seqs_170118.gb")
  }else{
    cat(recs, file="mitochondrial_COI_seqs_170118.gb", append=TRUE)
  }
  # print status to console
  cat(seq_start+9999, "sequences downloaded\n")
}

Does anyone have any suggestions? I wondered if I should be breaking the entrez_search step into chunks as well - is that possible? I'm also confused about the difference between using entrez_post and creating a web history within entrez_search with use_history=TRUE.

Thanks in advance :)

entrez rentrez e-utils R ncbi • 485 views
ADD COMMENTlink written 13 months ago by klrdna0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 894 users visited in the last hour