Question: Method to Check Fastq Completeness after Fastq-dump
5
gravatar for Shicheng Guo
3.2 years ago by
Shicheng Guo8.1k
Shicheng Guo8.1k wrote:

Hi All,

What's your method to check the completeness of the fastq file after the download by fastq-dump from SRA database? I always find some non-completeness fastqs after the fastq-dump.

Thanks.

completeness fastq-dump • 3.2k views
ADD COMMENTlink modified 15 months ago by ATpoint32k • written 3.2 years ago by Shicheng Guo8.1k
2

You should always check EBI-ENA to see if fastq files are available. For the SRR# you posted below.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by genomax80k

see How can I find SRA MD5 checksums for FASQ files?

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum127k
2

By the way: how to deal with Resume Broken Download Problem for fastq-dump ?

ADD REPLYlink written 3.2 years ago by Shicheng Guo8.1k

17 months ago and no answer to thais question, i have the same issue here when dumping big files (~30G) and don't want to restart downloading, how to resume browken download with fast-dump? best

ADD REPLYlink written 21 months ago by Samad90
1

Thanks. The method you mention works in some way. However, for the majority situation, it doesn't work. for example:

fastq-dump --split-files --gzip SRR949203

if you just download the SRA files, I think it is okay to use

 vdb-validate SRR949203
ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Shicheng Guo8.1k
2
gravatar for ATpoint
15 months ago by
ATpoint32k
Germany
ATpoint32k wrote:

Just to update this, it is not recommended to use fastq-dump for downloads. It is slow and prone to connection losses. Better use prefetch together with Aspera, see here, to get the SRA files, and then use fastq-dump to convert to fastq. Still, you can get most data directly from the European Nucleotide Archive in fastq format. Downloading from there is pretty simple and fast, see my tutorial on that: Fast download of FASTQ files and metadata from the European Nucleotide Archive (ENA) . If you have to download from NCBI, e.g. because data are restricted, go with prefetch followed by parallel-fastq-dump, which is a wrapper for parallelizing fastq-dump. After successfully converting a sra to fastq, both tools (fastq-dump/parallel-fastq-dump) print a summary message that only shows up if no errors occurred, so I never felt the need to verify the fastq file after converting from sra, given that message was printed.

ADD COMMENTlink modified 15 months ago • written 15 months ago by ATpoint32k

Hi ATpoint, How to apply Aspera in Linux server?

ADD REPLYlink written 15 months ago by Shicheng Guo8.1k

It is covered in Fast download of FASTQ files and metadata from the European Nucleotide Archive (ENA)

ADD REPLYlink written 15 months ago by ATpoint32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 842 users visited in the last hour