SRA Toolkit Prefetch error when downloading dbGaP SRR files
Entering edit mode
4.0 years ago
Anthony.Knox ▴ 60

I am trying to download SRR files that requires authorization through dbGaP, so I have to download it through command line. I downloaded the Aspera Connect CLI as well as the Aspera Connect app and imported the repository key file correctly. I am using prefetch-2.8.2. I am reading in a text file with each line as a different SRR file and calling prefetch for that file with /path/to/prefetch-2.8.2 SRRXXXXXX (this is the shell script). The . . . is where it successfully downloaded 124 out of 299 dependencies. It seems to not be able to get past 124 dependency downloads.

Why am I getting this file descriptor error after 124 dependencies and then the download fails? What are these dependencies and is it integral that they download correctly? Hard disk space is not an issue.

Here is my shell script, and I called chmod +x (not shown)

while IFS= read -r line || [[ -n "$line" ]]; do
    echo $line
        /Users/anthonyknox/Documents/AKnox/RNA-seq_analysis/Tuxedo_tools/sratoolkit.2.8.2-1-mac64/bin/prefetch.2.8.2 $line
done < "$1"

Here is my code:

otp1423081ots:~ anthonyknox$ PATH=$PATH:/Users/anthonyknox/Documents/AKnox/RNA-seq_analysis/Tuxedo_tools/sratoolkit.2.8.2-1-mac64/bin 
otp1423081ots:~ anthonyknox$ cd /Users/anthonyknox/ncbi/dbGaP-16260 
otp1423081ots:dbGaP-16260 anthonyknox$ vi 
otp1423081ots:dbGaP-16260 anthonyknox$ ./ /Users/anthonyknox/ncbi/Downloading\ SRR/RV_LV_SRR_1.txt 

2017-12-05T18:55:06 prefetch.2.8.2: 1) Downloading 'SRR1768659'...
2017-12-05T18:55:06 prefetch.2.8.2:  Downloading via fasp...
2017-12-05T18:55:51 prefetch.2.8.2:  fasp download succeed
2017-12-05T18:55:51 prefetch.2.8.2: 1) 'SRR1768659' was downloaded successfully
2017-12-05T18:57:31 prefetch.2.8.2: 'SRR1768659' has 299 unresolved dependencies
2017-12-05T18:57:32 prefetch.2.8.2: 2) Downloading 'ncbi-acc:CM000663.2?vdb-ctx=refseq'...
2017-12-05T18:57:32 prefetch.2.8.2:  Downloading via fasp...
2017-12-05T18:57:37 prefetch.2.8.2:  fasp download succeed
2017-12-05T18:57:37 prefetch.2.8.2: 2) 'ncbi-acc:CM000663.2?vdb-ctx=refseq' was downloaded successfully
2017-12-05T19:06:26 prefetch.2.8.2: 124) Downloading 'ncbi-acc:KI270529.1?vdb-ctx=refseq'...
2017-12-05T19:06:26 prefetch.2.8.2:  Downloading via fasp...
pipe() from: Too many open files
2017-12-05T19:06:26 prefetch.2.8.2 err: file descriptor failed while creating file descriptor - while pipe
pipe() to: Too many open files
2017-12-05T19:06:26 prefetch.2.8.2 err: file descriptor failed while creating file descriptor - while pipe
2017-12-05T19:06:26 prefetch.2.8.2:  fasp download failed
2017-12-05T19:06:26 prefetch.2.8.2:  Downloading via http...
2017-12-05T19:06:26 prefetch.2.8.2 int: connection not found while validating within network system module - ncbi-acc:KI270529.1?vdb-ctx=refseq: Cannot resolve remote
2017-12-05T19:06:26 prefetch.2.8.2: 124) failed to download ncbi-acc:KI270529.1?vdb-ctx=refseq
2017-12-05T19:06:30 prefetch.2.8.2: 'SRR1768659' has no remote vdbcache

Thank you in advance!

SRA Toolkit Prefetch RNA-Seq dbGaP ncbi • 3.3k views
Entering edit mode
4.0 years ago
Anthony.Knox ▴ 60

For anyone interested, it turns out that this is a bug where prefetch is leaking file descriptors. As a current workaround (for 2.8.2), just use this command before calling prefetch:

ulimit -n 10000

This will set the maximum number of open file descriptors to 10000 (mine was at 256 as a default).

Entering edit mode

Thank you for coming back and providing closure to this thread.


Login before adding your answer.

Traffic: 2476 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6