Tutorial: High Speed Downloading of SRA, SAM and Fastq Files
2
gravatar for Wenhu_Cao
2.8 years ago by
Wenhu_Cao70
Wenhu_Cao70 wrote:

This is a brief tutorial about methods of downloading sra, sam and fastq files, mainly focusing on Aspera Connect.

NCBI-SRA and EBI-ENA databases

SRA: Sequence Read Archive: It belongs to NCBI (National Center for Biotechnology Information), is a database storing high throughput sequencing (HTS) raw data, alignment information and metadata. Almost all HTS data in published publications will be asked uploading to here, and stored as .sra compressed file format.

ENA: European Nucleotide Archive: It belongs to EBI (European Bioinformatics Institute), although it has the same funtion with SRA, more annotations and friendlier website make it preferable. What's more, you could download directly fastq.gz files from it.

File Downloading

Mostly, we download sra files for the purpose of getting corresponding fastq or sam files, so as to use them in our own pipeline for downstream analysis.

  1. Places: You should search ENA database first with the SRR (SRA Run) accession number to check if it is there. If not, go to SRA database.

  2. Methods:

    • First Choice -- Aspera Connect. It is a commercial high speed file transfer software produced by IBM. Since it has contract with NCBI and EBI, we could use it to download data in those two databases for free. Many sites can transfer data at 200-500Mbps. and nearly all sites can transfer at faster than 10Mbps.

    • If the Aspera Connect doesn't work, I would recommend the prefetch command in sratoolkit.

    • At last, please try fastq-dump and sam-dump in sratoolkit. If the connection of fastq-dump is unstable, I would suggest the wonderdump script in Biostar Handbook.

Warning: Try not to use wget or curl to download, it might cause incompletion in downloaded sra files.

...

Details about how to install and use Aspera Connect command line tool -- ascp, please read my blog: High Speed Downloading of SRA, SAM and Fastq Files

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Wenhu_Cao70
1

You can also speed-up fastq-dump by using the parallel version: https://github.com/rvalieris/parallel-fastq-dump

ADD REPLYlink written 2.8 years ago by James Ashmore3.0k

I would try it, thx!

ADD REPLYlink written 2.8 years ago by Wenhu_Cao70

Thanks Wenhu_Cao,

also see for a related post with a few more details: Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLYlink written 12 months ago by ATpoint39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1706 users visited in the last hour