Question: TrimGalore! on multiple paired fastq files
1
gravatar for emblake
3.1 years ago by
emblake50
United States
emblake50 wrote:

I have 60 PE fastq files that I would like to batch process using TrimGalore! I know a for...in loop would best serve my purpose, but I don't think I'm setting it up correctly. Would someone more experienced with scripting assist? Thank you!

File format: SMXX_R1_merged.fastq.gz, SMXX_R2_merged.fastq.gz

#!/bin/bash 
for f1 in *_R1_merged.fastq.gz 
do
        f2=${f1%%_R1_merged.fastq.gz}"_R2_merged.fastq.gz"
        trim_galore --illumina --paired --fastqc -o trim_galore/ $f1 $f2 
done
ADD COMMENTlink modified 9 months ago by rogangrant202210 • written 3.1 years ago by emblake50
3

You can run with GNU parallel.

find  path_to_fastq  -name "*_R1_merged.fastq.gz" | cut -d "_" -f1 | parallel -j 1 trim_galore --illumina --paired --fastqc -o trim_galore/ {}\_R1_merged.fastq.gz {}\_R2_merged.fastq.gz
ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by GZ1995350
1

You have a typo: *parallel ;-)

Might be best to include link as well: https://www.gnu.org/software/parallel/

ADD REPLYlink written 3.1 years ago by WouterDeCoster42k

I've installed GNU parallel and run:

find  /path/to/fastq  -name "*_R1_merged.fastq.gz" | cut -d "_" -f1 | parallel -j 1 trim_galore --illumina --paired --fastqc -o trim_galore/ {}\_R1_merged.fastq.gz {}\_R2_merged.fastq.gz

but it fails with:

gzip: /path/to/fastq/trim_R1_merged.fastq.gz: No such file or directory Input file '/path/to/fastq/trim_R1_merged.fastq.gz' seems to be completely empty. Consider respecifying!

Path to Cutadapt set as: 'cutadapt' (default) Cutadapt seems to be working fine (tested command 'cutadapt --version') Failed to write to file 'trim_R1_merged.fastq.gz_trimming_report.txt': No such file or directory 1.11

I can see that the file naming convention is incorrect, but I'm not sure how to fix it.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by emblake50

Looks like the path is not correct. Are you sure /path/to/fastq is your directory that contains your gz files ? Can you print find /path/to/fastq -name "*_R1_merged.fastq.gz" | cut -d "_" -f1 ?

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by GZ1995350

I checked the path, and there was an issue with a subfolder named 'trim_galore'. I corrected the error, and it seems to be executing just fine now. Thanks very much!

ADD REPLYlink written 3.1 years ago by emblake50
2
gravatar for ole.tange
3.1 years ago by
ole.tange3.6k
Denmark
ole.tange3.6k wrote:
parallel trim_galore --illumina --paired --fastqc -o trim_galore/ {} {=s/_R1_/_R2_/=} ::: *_R1_merged.fastq.gz
ADD COMMENTlink written 3.1 years ago by ole.tange3.6k
1
gravatar for rogangrant2022
9 months ago by
rogangrant202210 wrote:

Alternatively, GNU parallel can easily handle multiple inputs:

parallel --xapply trim_galore --illumina --paired --fastqc -o trim_galore/ ::: *_R1_merged.fastq.gz ::: *_R2_merged.fastq.gz

Note that the xapply flag just runs each pair. If you do not include it, every combination of reads will be run between the two lists (not what you want).

ADD COMMENTlink modified 9 months ago • written 9 months ago by rogangrant202210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1945 users visited in the last hour