Parallel and Trim Galore
1
0
Entering edit mode
3.5 years ago
jjp55 ▴ 20

Hi all.

I am trying to get my code to work but it appears to have a bug somewhere and Trim_Galore is not reading the loop how I want it to. I have paired-end sequencing reads

My code is:

parallel --plus 'trim_galore --stringency 3 --paired {...}.fastq.gz {...}R2.fastq.qz' ::: *fastq.qz

My code is able to read the R1, the first of the set, but then it can't read the second. It tries to read the R2 as a file with R1R2.fastq.gz at the end rather than R2.fastq.gz. Any help is greatly appreciated.

ChIP-Seq sequence next-gen • 1.6k views
ADD COMMENT
0
Entering edit mode
3.5 years ago
ATpoint 82k

I personally prefer to only give basenames to parallel and do the "name matching thing" outside of it. That makes it (for me) easier to manipulate the name string as I like.

For example:

# dummy data
touch foo.R1.fastq.gz
touch foo.R2.fastq.gz
touch bar.R1.fastq.gz
touch bar.R2.fastq.gz

$ ls *fastq.gz
bar.R1.fastq.gz bar.R2.fastq.gz foo.R1.fastq.gz foo.R2.fastq.gz

# now extract the basenames:
$ ls *R1.fastq.gz | awk -F ".R1.fastq.gz" '{print $1}'
bar
foo

# together with parallel (sort -u to ensure no duplicates)
ls *.R1.fastq.gz \
| awk -F ".R1.fastq.gz" '{print $1}' \
| sort -u \
| parallel "trim_galore --stringency 3 --paired {}.R1.fastq.gz {}.R2.fastq.gz"
ADD COMMENT

Login before adding your answer.

Traffic: 2139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6