I am trying to download some sequences from SRA using fasterq-dump. The problem is that even though the particular data appear to be paired-end, I only get one file, instead of getting two files with the "_1" and "_2" suffices. Probably, this is due to how the original fastq reads were deposited in SRA. However, what I find confusing is the stdout of fasterq-dump:
spots read : 26,841,751
reads read : 53,683,502
reads written : 26,841,751
reads 0-length : 26,841,751
So, my question is, did it really read ~53 million, but wrote only half of them (~26 million)?
The exact command I used is
fasterq-dump ERR392013 --split-files --seq-defline '@$sn[_$rn]/$ri' --threads 8 --progress
Thanks GenoMax! I have seen this quite a few times with different SRA data...
And do you have any idea why would the numbers of read and written reads differ?
This another of those unfortunate SRA oddities. If you see the record in SRA you can see that the data is supposed to contain one read with 100 bp and other with 0 bp (I know). So that is why you get half the number of reads since
fastq-dump
removes reads of 0 length (you can add option-M 0
to get them as well).From your original post:
reads written + reads of 0 length = reads read
. The two reads come from the samespot
.