Question: Number of reads in the downloaded fastq file
gravatar for kmkdesilva
3 months ago by
United States
kmkdesilva90 wrote:


I am trying to download some data from SRA. I used fasterq-dump. This is the command I used. fasterq-dump --split-files --split-spot -O /path/fastq SRR3045676

I wanted to check whether I have downloaded all the reads for the accession. When I used vdb-dump it showed there are 166,306,903 sequence reads under this accession. vdb-dump --info SRR3045676 SEQ:166,306,903

The output file of the fasterq-dump command said it has read 332,613,806 (166,306,903 x 2) reads. But 331,487,754 (165,743,877 x 2) was written. spots read : 166,306,903 reads read : 332,613,806 reads written : 331,487,754

But when I used the following command to count the reads in the downloaded file (R1), it gives a number (165,180,851) less than 165,743,877 echo $(zcat SRR2102500_R1.fastq.gz | wc -l)/4 | bc >> /path/readCount.txt 165,180,851

Can someone please explain why the output says a less number of reads were written and why even lesser number of reads are found in the downloaded fastq file. I tried downloading this accession twice and both times gave the same results. I downloaded few other accessions and they had the exact same number of sequences given by vdb-dump --info command in the final fastq file.

