Question: Download slices of SRA/FastQ files?
gravatar for juara
20 months ago by
juara10 wrote:


I am interested in specific loci of the genome in TARGET WGS data and so am trying to download only those regions as opposed to the whole SRA files which takes forever to download. So far, I figured that I can download the SRA files, convert them to BAM, take a slice of interest and delete the rest of the unneeded data. However, the problem is converting SRA to BAM takes a lot of time and I have quite a number of files to process.

I know GDC has BAM slicing built-in function, however, it does not contain the complete data from TARGET yet. I am looking for something similar but doable in other platforms.

Is there any API that I can do the slicing with SRA files on dbGAP or SRA website so that I do not have to download the whole WGS files?


ADD COMMENTlink written 20 months ago by juara10

Fastq are unaligned data. Therefore, they do not have positional records, so they cannot be subsetted prior to alignment. What you can do is to check if your data-of-interest are mirrored at the European Nucleotide Archive (ENA). They mirror most NCBI data as fastq instead of this terrible SRA junk, and have the option to download via Aspera (allowing download with up to 100Mb/s), speeding up at least the data acquisition step. Still, the alignment will take time.

ADD REPLYlink modified 20 months ago • written 20 months ago by ATpoint18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1627 users visited in the last hour