SRA to paired fastq per read group
1
0
Entering edit mode
4.1 years ago
MAPK ★ 2.1k

I am trying to download SRA data and create paired end fastq files per read groups. Can someone please share how I can get this done? I would really appreciate if you could share a shell script to do this.

I tried this, which only splits fastq per RGs, but I also need to split them into FQ1 and FQ2 per RGs.

SRR="SRR1350739"
IFS=$'\n'
RGLINES=($(sam-dump --ngc XXXX.ngc ./${SRR} | sed -n '/^[^@]/!p;//q' | grep ^@RG))
args=(tee)
for RGLINE in ${RGLINES[@]}; do
  unset IFS
  RG=(${RGLINE})
args+=(\>\(grep -A3 --no-group-separator \"\\.${RG[1]#ID:}/[12]$\" \| gzip \> "./${SRR}.${RG[1]#ID:}.fastq-dump.split.defline.z.tee.fq.gz"\))

done

echo "Splitting ${SRR} into ${#RGLINES[@]} ReadGroups"
fastq-dump --ngc XXXX.ngc --split-e --defline-seq '@$ac.$si.$sg/$ri' --defline-qual '+' -Z "${SRR}" | eval ${args[@]}
sra NGS • 1.1k views
ADD COMMENT
0
Entering edit mode
4.1 years ago
GenoMax 146k

Use bamtofastq from biobambam2. It can separate data into RG specific files.

ADD COMMENT

Login before adding your answer.

Traffic: 917 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6