fastq dump error
2
1
Entering edit mode
5.7 years ago
Satyajeet Khare ★ 1.6k

Hi Biostars

I am trying to convert sra from PRJNA282735 dataset to fastq and I am getting following error...

fastq-dump.2.1.7 fatal: SIGNAL - Segmentation fault

My fastq-dump command is

fastq-dump --split-3 SRR2016445.sra -O SRR2016445


I am not able to find similar error elsewhere. The ENA page for some samples of this dataset has three files per SRX experiment (e.g. SRR2016445.fastq, SRR2016445_1.fastq and SRR2016445_2.fastq).

This is unusual for me as I usually get one or two SRR runs per experiment (depending on single end paired end) but never 3. I am wondering if this is the reason for getting errors.

Anybody with similar experience?

fastq-dump sra sra toolkit • 8.3k views
1
Entering edit mode

Are you using the latest sratoolkit? NCBI has moved to HTTPS only connections. I am getting two files dumping with (v. 2.8) fastq-dump --split-3 SRR2016445

2
Entering edit mode
5.7 years ago

I think 2.1. is fairly old, with a slightly newer version 2.4.2 I get: fastq-dump.2.4.2 err: error unexpected while resolving tree within virtual file system module - failed to resolve accession 'SRR2016445' - Obsolete software. See https://github.com/ncbi/sra-tools/wiki ( 406 )

The latest release on github is 2.8, you should get and install the latest version as outlined here: https://github.com/ncbi/sra-tools/wiki

0
Entering edit mode

@Mike, that was it! I see three fastq files for SRR2016445 as expected using fastq-dump2.8.

4
Entering edit mode
5.7 years ago
piet ★ 1.8k

Please note that most SRA files are not self contained, they depend on a reference sequence which is a separate download. Thus it is not enough to download the SRA file with wget. 'fastq-dump' will try to download the reference sequence behind the scenes before it extracts any reads. The reference sequence for SRR2016445 is https://www.ncbi.nlm.nih.gov/nuccore/149361431.

0
Entering edit mode

Thanks a lot! I tried following command...

prefetch SRR2016445

A reference file got downloaded in ~/public/refseq/ folder and SRR file in ~/public/sra/ folder. I could split SRR file into three fastq files using fastq-dump2.8.0 command. I guess the small fastq file without '_1' or '_2' extension comprises of unpaired reads.

For some reason, I am not able to convert the reference file 'NC_000072' from binary to fasta using fastq-dump.

P.S. fastq-dump does not work very well for download. It downloads both SRR file and reference file just like prefetch command, but the files retain .cache extension, which I believe is an indication of incomplete download.

0
Entering edit mode

Ok, so vdb-dump was of help there. Here is the command.

vdb-dump.2.8.0 -f fasta1 --output-file NC_000072.5.fa NC_000072.5

0
Entering edit mode

I noticed some time ago that the .cache files would always placed in your home by fastq-dump, while you download to a possibly much larger partition. This can easily fill up your home and it won't remove the files. I would therefore try to run fastq-dump like this HOME=./ fastq-dump --split-3 SRR2016445

0
Entering edit mode

No luck. Still get .cache files. Will try and figure out whats going wrong.

0
Entering edit mode

Running vdb-config -i allows one to choose directories that will be used by SRAtoolkit. This needs to be done once and will require X-windows (if run with -i). If you want a pure text version run vdb-config -i --interactive-mode textual.

0
Entering edit mode

@genomax2,

Configuration looks fine. Default path is ncbi/public. There is no proxy and rest of the settings are default. So why fastq-dump didn't download the SRR file properly (without .cache extension) and why reference file was not converted to fasta is a mystery to me. For now, I can survive with this three step process (prefetch/fastq-dump/vdb-dump).

P.S. fastq-dump works fine for other datasets which don't have refseq file.

0
Entering edit mode

Brave people will just edit '~/.ncbi/user-settings.mkfg' with their favorite text editor. Having ' vdb-config' to modifying a simple config file is over engineered.