Question

fasterq-dump not downloading both reads of a pair

0

Entering edit mode

18 months ago

Panos ★ 1.8k

I am trying to download some sequences from SRA using fasterq-dump. The problem is that even though the particular data appear to be paired-end, I only get one file, instead of getting two files with the "_1" and "_2" suffices. Probably, this is due to how the original fastq reads were deposited in SRA. However, what I find confusing is the stdout of fasterq-dump:

spots read      : 26,841,751
reads read      : 53,683,502
reads written   : 26,841,751
reads 0-length  : 26,841,751

So, my question is, did it really read ~53 million, but wrote only half of them (~26 million)?

The exact command I used is

fasterq-dump ERR392013 --split-files --seq-defline '@$sn[_$rn]/$ri' --threads 8 --progress

fasterq-dump • 1.0k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 18 months ago by Panos ★ 1.8k

score 2 · Accepted Answer · 2022-11-10

2

Entering edit mode

18 months ago

GenoMax 142k

Looks like this is one of those datasets where there is a mismatch between original submission and what is noted in SRA.

Even though the layout says PAIRED it looks like the dataset is from two files from the same lane if you look at the Original dataformat as submitted. So this should be a single-end dataset.

dataset

ADD COMMENT • link 18 months ago by GenoMax 142k

0

Entering edit mode

Thanks GenoMax! I have seen this quite a few times with different SRA data...

And do you have any idea why would the numbers of read and written reads differ?

ADD REPLY • link 18 months ago by Panos ★ 1.8k

1

Entering edit mode

This another of those unfortunate SRA oddities. If you see the record in SRA you can see that the data is supposed to contain one read with 100 bp and other with 0 bp (I know). So that is why you get half the number of reads since fastq-dump removes reads of 0 length (you can add option -M 0 to get them as well).

From your original post: reads written + reads of 0 length = reads read. The two reads come from the same spot.

data

ADD REPLY • link 18 months ago by GenoMax 142k