Tutorial: High Speed Downloading of SRA, SAM and Fastq Files
2
gravatar for Wenhu_Cao
18 months ago by
Wenhu_Cao50
Wenhu_Cao50 wrote:

This is a brief tutorial about methods of downloading sra, sam and fastq files, mainly focusing on Aspera Connect.

NCBI-SRA and EBI-ENA databases

SRA: Sequence Read Archive: It belongs to NCBI (National Center for Biotechnology Information), is a database storing high throughput sequencing (HTS) raw data, alignment information and metadata. Almost all HTS data in published publications will be asked uploading to here, and stored as .sra compressed file format.

ENA: European Nucleotide Archive: It belongs to EBI (European Bioinformatics Institute), although it has the same funtion with SRA, more annotations and friendlier website make it preferable. What's more, you could download directly fastq.gz files from it.

File Downloading

Mostly, we download sra files for the purpose of getting corresponding fastq or sam files, so as to use them in our own pipeline for downstream analysis.

  1. Places: You should search ENA database first with the SRR (SRA Run) accession number to check if it is there. If not, go to SRA database.

  2. Methods:

    • First Choice -- Aspera Connect. It is a commercial high speed file transfer software produced by IBM. Since it has contract with NCBI and EBI, we could use it to download data in those two databases for free. Many sites can transfer data at 200-500Mbps. and nearly all sites can transfer at faster than 10Mbps.

    • If the Aspera Connect doesn't work, I would recommend the prefetch command in sratoolkit.

    • At last, please try fastq-dump and sam-dump in sratoolkit. If the connection of fastq-dump is unstable, I would suggest the wonderdump script in Biostar Handbook.

Warning: Try not to use wget or curl to download, it might cause incompletion in downloaded sra files.

...

Details about how to install and use Aspera Connect command line tool -- ascp, please read my blog: High Speed Downloading of SRA, SAM and Fastq Files

ADD COMMENTlink modified 18 months ago • written 18 months ago by Wenhu_Cao50
1

You can also speed-up fastq-dump by using the parallel version: https://github.com/rvalieris/parallel-fastq-dump

ADD REPLYlink written 18 months ago by James Ashmore2.6k

I would try it, thx!

ADD REPLYlink written 18 months ago by Wenhu_Cao50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1962 users visited in the last hour