Using trimmomatic on multiple single-end read files
2
0
Entering edit mode
7.1 years ago

I need help to write a for loop to run Trimmomatic tool for quality trimming of single-end fastq files. I need to write a for loop so that I can run an executable for all multiple files. I read the exchanges of a similar question for the paired-end data. But it does not help me much. Any help please! Thanks!

software error Assembly next-gen sequencing genome • 10k views
ADD COMMENT
1
Entering edit mode
7.1 years ago

It's much simpler than PE ends files:

Shell:

for file in *.fq.gz; do
    # do something with the file
    echo $file
done

GNU Parallel, Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them

ls *.fq.gz | parallel 'echo {}'
ADD COMMENT
1
Entering edit mode
7.1 years ago
st.ph.n ★ 2.7k

Make a bash script with your trimmomatic command:

#!/usr/bin/bash

java -jar trimmomatic-0.35.jar SE -phred33 $1 "`basename $1 .fastq.gz`.trimmomatic_out.fastq.gz" ILLUMINACLIP:TruSeq3-SE:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

where $1 is your input file, and basename will remove the .fastq.gz and replace with the suffix .trimmomatic_out.fastq.gz. Save as run_all_trim.sh.

List all of your single end files in a a file as a list (single column): SE_files.txt. If they are all in one dir: ls -1 *.fastq.gz > SE_files.txt. Then pass each of your single end files to the trimmomatic command.

cat SE_files.txt | xargs -n 1 bash run_all_trim.sh

If you have a lot of files, and don't want it to hangup, and to run in the background:

cat SE_files.txt | xargs -n 1 nohup bash run_all_trim.sh &

htop or top to check periodically that it's still running.

ADD COMMENT
0
Entering edit mode

Hi, your comment is getting old but was very useful. However, could you explain how it works? I don't get it.

I wrote this:

trimmomatic SE -threads 16 -phred33 $1 “/trimmomatic/`basename $1 .fastq.gz`.trimmomatic_out.fastq.gz" \

It uses as input my files that are in ./raw and send them to ./trimmomatic. This was my intention, but how does it understand to use ./raw as input and not just ./ ?

ADD REPLY
0
Entering edit mode

List your files in a text file. If they are all in a folder called raw, and you want to run from there, the filenames in the SE_files.txt would be raw/prefix.fastq.gz. The point is you're putting the command in a bash script, and then looping through each line (file) in the text one at a time.

Similarly you can write a bash script, as shenwei pointed out above, where you can do:

#!/usr/bin/bash
for file raw/*.fastq.gz; do
            echo $file   
            java -jar trimmomatic-0.35.jar SE -phred33 $file "`basename $file .fastq.gz`.trimmomatic_out.fastq.gz" ILLUMINACLIP:TruSeq3-SE:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
 done

You can save this as run_trim.sh, and run in the background with:

nohup bash run_trim.sh > log.txt &

Each line in the log file will have a filename to track the progress. As it's running, you can:

wc -l log.txt

to see where it's at compared to the total number of files (ls -1 raw/*.fastq.gz | wc -l )

ADD REPLY

Login before adding your answer.

Traffic: 2127 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6