Download slices of SRA/FastQ files?
0
0
Entering edit mode
6.5 years ago
juara ▴ 40

Hello

I am interested in specific loci of the genome in TARGET WGS data and so am trying to download only those regions as opposed to the whole SRA files which takes forever to download. So far, I figured that I can download the SRA files, convert them to BAM, take a slice of interest and delete the rest of the unneeded data. However, the problem is converting SRA to BAM takes a lot of time and I have quite a number of files to process.

I know GDC has BAM slicing built-in function, however, it does not contain the complete data from TARGET yet. I am looking for something similar but doable in other platforms.

https://docs.gdc.cancer.gov/API/Users_Guide/BAM_Slicing/

Is there any API that I can do the slicing with SRA files on dbGAP or SRA website so that I do not have to download the whole WGS files?

Thanks

next-gen WGS sequencing data retrieval • 2.1k views
ADD COMMENT
3
Entering edit mode

Fastq are unaligned data. Therefore, they do not have positional records, so they cannot be subsetted prior to alignment. What you can do is to check if your data-of-interest are mirrored at the European Nucleotide Archive (ENA). They mirror most NCBI data as fastq instead of this terrible SRA junk, and have the option to download via Aspera (allowing download with up to 100Mb/s), speeding up at least the data acquisition step. Still, the alignment will take time.

ADD REPLY

Login before adding your answer.

Traffic: 2281 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6