Extract SRA reads per RG using fasterq-dump
0
0
Entering edit mode
7 months ago
MAPK ★ 1.8k

I am trying to download dbGAP SRA samples. I used to use fastq-dump with the following command below, but for this particular project fastq-dump is running really slow because of larger datasets. So, I wanted to use fasterq-dump tool, but couldn't figure out how I could split reads per RG tags. I tried fasterq-dump with the same command below, but it looks like fasterq-dump doesn't have defline option. Any suggestions?

This is the command I use with fastq-dump:

prefetch --ngc /dbGaP/prj_222.ngc -X 9999999999999 ${SRR}    
IFS=$'\n'
RGLINES=($(sam-dump --ngc /dbGaP/prj_222.ngc ./${SRR} | sed -n '/^[^@]/!p;//q' | grep ^@RG))
args=(tee)
for RGLINE in ${RGLINES[@]}; do
unset IFS
RG=(${RGLINE})
args+=(\>\(grep -A3 --no-group-separator \"\\.${RG[1]#ID:}/[12]$\" \| gzip \> "./${SRR}.${RG[1]#ID:}.fastq-dump.split.defline.z.tee.fq.gz"\))
done
args+=(\>/dev/null)
echo "Splitting ${SRR}.sra into ${#RGLINES[@]} ReadGroups"
fastq-dump-orig.2.10.8 --ngc /dbGaP/prj_222.ngc --split-3 --defline-seq '@$ac.$si.$sg/$ri' --defline-qual '+' -Z "./${SRR}" | eval ${args[@]}
SRA fasterq-dump dbGAP • 386 views
ADD COMMENT
0
Entering edit mode

You should use prefetch to first download the SRA file and then use fastq-dump on that file. I am almost certain that fastq-dump alone will not manage to download large files without at least one connection error. prefetch is much more stable. See the last section of Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLY
0
Entering edit mode

I actually downloaded SRA with prefetch first and then used that in fastq-dump -Z "./${SRR}". Not sure if this is the correct way to use downloaded SRA folder.

ADD REPLY

Login before adding your answer.

Traffic: 2288 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6