fasterq-dump not downloading both reads of a pair
1
0
Entering edit mode
18 months ago
Panos ★ 1.8k

I am trying to download some sequences from SRA using fasterq-dump. The problem is that even though the particular data appear to be paired-end, I only get one file, instead of getting two files with the "_1" and "_2" suffices. Probably, this is due to how the original fastq reads were deposited in SRA. However, what I find confusing is the stdout of fasterq-dump:

spots read      : 26,841,751
reads read      : 53,683,502
reads written   : 26,841,751
reads 0-length  : 26,841,751

So, my question is, did it really read ~53 million, but wrote only half of them (~26 million)?

The exact command I used is

fasterq-dump ERR392013 --split-files --seq-defline '@$sn[_$rn]/$ri' --threads 8 --progress
fasterq-dump • 1.0k views
ADD COMMENT
2
Entering edit mode
18 months ago
GenoMax 142k

Looks like this is one of those datasets where there is a mismatch between original submission and what is noted in SRA.

Even though the layout says PAIRED it looks like the dataset is from two files from the same lane if you look at the Original dataformat as submitted. So this should be a single-end dataset.

dataset

ADD COMMENT
0
Entering edit mode

Thanks GenoMax! I have seen this quite a few times with different SRA data...

And do you have any idea why would the numbers of read and written reads differ?

ADD REPLY
1
Entering edit mode

This another of those unfortunate SRA oddities. If you see the record in SRA you can see that the data is supposed to contain one read with 100 bp and other with 0 bp (I know). So that is why you get half the number of reads since fastq-dump removes reads of 0 length (you can add option -M 0 to get them as well).

From your original post: reads written + reads of 0 length = reads read. The two reads come from the same spot.

data

ADD REPLY

Login before adding your answer.

Traffic: 1234 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6