Question: SRA: fastq-dump gives different number of sequences
gravatar for jeetsahu
2.2 years ago by
jeetsahu10 wrote:

I have downloaded read sequences using fastq-dump with split file option and SRR id for paired sequences. But splitted files have different number of sequence reads. As per my understanding, since these are paired-end reads these should have equal number of sequences.

$fastq-dump -I --split-files SRR390728

$grep -c '>' SRR7716545_1.fastq


$grep -c '>' SRR7716545_2.fastq


Please correct me if I am wrong.

sequence sra • 738 views
ADD COMMENTlink modified 22 months ago by Biostar ♦♦ 20 • written 2.2 years ago by jeetsahu10
gravatar for ATpoint
2.2 years ago by
ATpoint44k wrote:

Both files have the same number of reads. You have to grep for '^@', because @ is the fastq header prefix. > is fasta.

ls *.fastq | parallel "echo {} && grep -c '^@' {}"
ADD COMMENTlink written 2.2 years ago by ATpoint44k

Thanks, I grepped different symbol. One quick question - Does fastq-dump gives latest dataset used for assembly? if yes how can I get old datasets?

ADD REPLYlink written 2.2 years ago by jeetsahu10

fastq-dump gives the fastq based on the input SRR you give it. I have no detail knowledge about your SRR.

ADD REPLYlink written 2.2 years ago by ATpoint44k

Hello jeetsahu ,

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.


ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by finswimmer14k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1678 users visited in the last hour