Question: Extract SRA reads per RG using fasterq-dump
gravatar for MAPK
5 weeks ago by
MAPK1.7k wrote:

I am trying to download dbGAP SRA samples. I used to use fastq-dump with the following command below, but for this particular project fastq-dump is running really slow because of larger datasets. So, I wanted to use fasterq-dump tool, but couldn't figure out how I could split reads per RG tags. I tried fasterq-dump with the same command below, but it looks like fasterq-dump doesn't have defline option. Any suggestions?

This is the command I use with fastq-dump:

prefetch --ngc /dbGaP/prj_222.ngc -X 9999999999999 ${SRR}    
RGLINES=($(sam-dump --ngc /dbGaP/prj_222.ngc ./${SRR} | sed -n '/^[^@]/!p;//q' | grep ^@RG))
for RGLINE in ${RGLINES[@]}; do
unset IFS
args+=(\>\(grep -A3 --no-group-separator \"\\.${RG[1]#ID:}/[12]$\" \| gzip \> "./${SRR}.${RG[1]#ID:}.fastq-dump.split.defline.z.tee.fq.gz"\))
echo "Splitting ${SRR}.sra into ${#RGLINES[@]} ReadGroups"
fastq-dump-orig.2.10.8 --ngc /dbGaP/prj_222.ngc --split-3 --defline-seq '@$ac.$si.$sg/$ri' --defline-qual '+' -Z "./${SRR}" | eval ${args[@]}
dbgap fasterq-dump sra • 135 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by MAPK1.7k

You should use prefetch to first download the SRA file and then use fastq-dump on that file. I am almost certain that fastq-dump alone will not manage to download large files without at least one connection error. prefetch is much more stable. See the last section of Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLYlink written 5 weeks ago by ATpoint42k

I actually downloaded SRA with prefetch first and then used that in fastq-dump -Z "./${SRR}". Not sure if this is the correct way to use downloaded SRA folder.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by MAPK1.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1061 users visited in the last hour