Question: fastq dump error
1
gravatar for Satyajeet Khare
3.3 years ago by
Satyajeet Khare1.5k
Pune, India
Satyajeet Khare1.5k wrote:

Hi Biostars

I am trying to convert sra from PRJNA282735 dataset to fastq and I am getting following error...

fastq-dump.2.1.7 fatal: SIGNAL - Segmentation fault

My fastq-dump command is

fastq-dump --split-3 SRR2016445.sra -O SRR2016445

I am not able to find similar error elsewhere. The ENA page for some samples of this dataset has three files per SRX experiment (e.g. SRR2016445.fastq, SRR2016445_1.fastq and SRR2016445_2.fastq).

This is unusual for me as I usually get one or two SRR runs per experiment (depending on single end paired end) but never 3. I am wondering if this is the reason for getting errors.

Anybody with similar experience?

fastq-dump sra toolkit sra • 5.2k views
ADD COMMENTlink modified 3.3 years ago by piet1.7k • written 3.3 years ago by Satyajeet Khare1.5k
1

Are you using the latest sratoolkit? NCBI has moved to HTTPS only connections. I am getting two files dumping with (v. 2.8) fastq-dump --split-3 SRR2016445

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by genomax80k
2
gravatar for Michael Dondrup
3.3 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

I think 2.1. is fairly old, with a slightly newer version 2.4.2 I get: fastq-dump.2.4.2 err: error unexpected while resolving tree within virtual file system module - failed to resolve accession 'SRR2016445' - Obsolete software. See https://github.com/ncbi/sra-tools/wiki ( 406 )

The latest release on github is 2.8, you should get and install the latest version as outlined here: https://github.com/ncbi/sra-tools/wiki

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Michael Dondrup47k

@Mike, that was it! I see three fastq files for SRR2016445 as expected using fastq-dump2.8.

@genomax2, download was not an issue. I used wget FTP to download the sra file. The conversion was. Your suggestion was right though.

ADD REPLYlink written 3.3 years ago by Satyajeet Khare1.5k
4
gravatar for piet
3.3 years ago by
piet1.7k
planet earth
piet1.7k wrote:

Please note that most SRA files are not self contained, they depend on a reference sequence which is a separate download. Thus it is not enough to download the SRA file with wget. 'fastq-dump' will try to download the reference sequence behind the scenes before it extracts any reads. The reference sequence for SRR2016445 is https://www.ncbi.nlm.nih.gov/nuccore/149361431.

ADD COMMENTlink written 3.3 years ago by piet1.7k

Thanks a lot! I tried following command...

prefetch SRR2016445

A reference file got downloaded in ~/public/refseq/ folder and SRR file in ~/public/sra/ folder. I could split SRR file into three fastq files using fastq-dump2.8.0 command. I guess the small fastq file without '_1' or '_2' extension comprises of unpaired reads.

For some reason, I am not able to convert the reference file 'NC_000072' from binary to fasta using fastq-dump.

P.S. fastq-dump does not work very well for download. It downloads both SRR file and reference file just like prefetch command, but the files retain .cache extension, which I believe is an indication of incomplete download.

ADD REPLYlink written 3.3 years ago by Satyajeet Khare1.5k

Ok, so vdb-dump was of help there. Here is the command.

vdb-dump.2.8.0 -f fasta1 --output-file NC_000072.5.fa NC_000072.5
ADD REPLYlink written 3.3 years ago by Satyajeet Khare1.5k

I noticed some time ago that the .cache files would always placed in your home by fastq-dump, while you download to a possibly much larger partition. This can easily fill up your home and it won't remove the files. I would therefore try to run fastq-dump like this HOME=./ fastq-dump --split-3 SRR2016445

ADD REPLYlink written 3.3 years ago by Michael Dondrup47k

No luck. Still get .cache files. Will try and figure out whats going wrong.

ADD REPLYlink written 3.3 years ago by Satyajeet Khare1.5k

Running vdb-config -i allows one to choose directories that will be used by SRAtoolkit. This needs to be done once and will require X-windows (if run with -i). If you want a pure text version run vdb-config -i --interactive-mode textual.

ADD REPLYlink written 3.3 years ago by genomax80k

@genomax2,

Configuration looks fine. Default path is ncbi/public. There is no proxy and rest of the settings are default. So why fastq-dump didn't download the SRR file properly (without .cache extension) and why reference file was not converted to fasta is a mystery to me. For now, I can survive with this three step process (prefetch/fastq-dump/vdb-dump).

P.S. fastq-dump works fine for other datasets which don't have refseq file.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Satyajeet Khare1.5k

Brave people will just edit '~/.ncbi/user-settings.mkfg' with their favorite text editor. Having ' vdb-config' to modifying a simple config file is over engineered.

ADD REPLYlink written 3.3 years ago by piet1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour