Could somebody please explain in easy terms how to use SRA format files from NCBI Sequence Read Archive? The files are large, so I use the Aspera plugin to download them. The documentation on NCBI (http://www.ncbi.nlm.nih.gov/books/NBK47540/#SRA_Download_Guid_B.3_Installing_the_Too) is hard to follow. I need to convert the files to fasta or fastq or sff. Thanks for any help.
You need SRA-Toolkit to filter what you want from the SRA archive (a mixture of raw files and other metadata.)
For instance, I have this ChIP-Seq data in the .sra format here.
I will jus tuse
wget to pull it and the extract the fastq files from it using the tool called
fastq-dump included in the SRA-Toolkit.
Usage: sratoolkit/fastq-dump [options] [ -A ] <accession> sratoolkit/fastq-dump [options] <path [path...]>
Check the complete manual of fastq-dump
Grab you copy of SRA-Toolkit, depending on your software architecture.
I know this is an old question, but I've just spent the afternoon wrestling with SRA-Toolkit as suggested by other answers - my computer seems not to like it -or I'm too stupid to make it work. So I thought I should point out that there is a way to solve this problem using galaxy for computer illiterates like me! Go to usegalaxy.org. Under the tab "NCBI SRA Tools" there are some options for extracting reads in Bam or fastq format - all you have to do is input the accession number!