Question: Parallel and Trim Galore
gravatar for jjp55
3 days ago by
jjp550 wrote:

Hi all.

I am trying to get my code to work but it appears to have a bug somewhere and Trim_Galore is not reading the loop how I want it to. I have paired-end sequencing reads

My code is:

parallel --plus 'trim_galore --stringency 3 --paired {...}.fastq.gz {...}R2.fastq.qz' ::: *fastq.qz

My code is able to read the R1, the first of the set, but then it can't read the second. It tries to read the R2 as a file with R1R2.fastq.gz at the end rather than R2.fastq.gz. Any help is greatly appreciated.

chip-seq next-gen sequence • 51 views
ADD COMMENTlink modified 3 days ago by ATpoint40k • written 3 days ago by jjp550
gravatar for ATpoint
3 days ago by
ATpoint40k wrote:

I personally prefer to only give basenames to parallel and do the "name matching thing" outside of it. That makes it (for me) easier to manipulate the name string as I like.

For example:

# dummy data
touch foo.R1.fastq.gz
touch foo.R2.fastq.gz
touch bar.R1.fastq.gz
touch bar.R2.fastq.gz

$ ls *fastq.gz
bar.R1.fastq.gz bar.R2.fastq.gz foo.R1.fastq.gz foo.R2.fastq.gz

# now extract the basenames:
$ ls *R1.fastq.gz | awk -F ".R1.fastq.gz" '{print $1}'

# together with parallel (sort -u to ensure no duplicates)
ls *.R1.fastq.gz \
| awk -F ".R1.fastq.gz" '{print $1}' \
| sort -u \
| parallel "trim_galore --stringency 3 --paired {}.R1.fastq.gz {}.R2.fastq.gz"
ADD COMMENTlink modified 3 days ago • written 3 days ago by ATpoint40k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1946 users visited in the last hour