Tutorial:High Speed Downloading of SRA, SAM and Fastq Files
Entering edit mode
6.2 years ago
Wenhu_Cao ▴ 100

This is a brief tutorial about methods of downloading sra, sam and fastq files, mainly focusing on Aspera Connect.

NCBI-SRA and EBI-ENA databases

SRA: Sequence Read Archive: It belongs to NCBI (National Center for Biotechnology Information), is a database storing high throughput sequencing (HTS) raw data, alignment information and metadata. Almost all HTS data in published publications will be asked uploading to here, and stored as .sra compressed file format.

ENA: European Nucleotide Archive: It belongs to EBI (European Bioinformatics Institute), although it has the same funtion with SRA, more annotations and friendlier website make it preferable. What's more, you could download directly fastq.gz files from it.

File Downloading

Mostly, we download sra files for the purpose of getting corresponding fastq or sam files, so as to use them in our own pipeline for downstream analysis.

  1. Places: You should search ENA database first with the SRR (SRA Run) accession number to check if it is there. If not, go to SRA database.
  2. Methods:

    • First Choice -- Aspera Connect. It is a commercial high speed file transfer software produced by IBM. Since it has contract with NCBI and EBI, we could use it to download data in those two databases for free. Many sites can transfer data at 200-500Mbps. and nearly all sites can transfer at faster than 10Mbps.
    • If the Aspera Connect doesn't work, I would recommend the prefetch command in sratoolkit.

    • At last, please try fastq-dump and sam-dump in sratoolkit. If the connection of fastq-dump is unstable, I would suggest the wonderdump script in Biostar Handbook.

Warning: Try not to use wget or curl to download, it might cause incompletion in downloaded sra files.

Details about how to install and use Aspera Connect command line tool -- ascp, please read my blog: High Speed Downloading of SRA, SAM and Fastq Files

fastq sam Aspera-Connect sra • 8.6k views
Entering edit mode

You can also speed-up fastq-dump by using the parallel version: https://github.com/rvalieris/parallel-fastq-dump

Entering edit mode

I would try it, thx!

Entering edit mode

Thanks Wenhu_Cao,

also see for a related post with a few more details: Fast download of FASTQ files from the European Nucleotide Archive (ENA)


Login before adding your answer.

Traffic: 2936 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6