Using Trimommatic to only clip adapters on multiple fastq files.
2
1
Entering edit mode
5.7 years ago
imesi ▴ 20

Hello everyone, newbie question here. I have 9 paired-end RNA-seq fastq files. I have done some qc with FastQC. My reads are of good quality as everything has the green check mark except the adapters. I have some questions, to clip the adapters or not to, and is it possible to write the trimmomatic script to only clip adapters, i.e no trimming? Also, where I am going wrong in the script below? When I run it, it runs foe 1 second and gives a "completed" exit code.

Many thanks

for R1 in *R1*
do
   R2=${R1//R1_001_.fastq.gz/R2_001_.fastq.gz}
   R1paired=${R1//.fastq.gz/_paired.fastq.gz}
   R1unpaired=${R1//.fastq.gz/_unpaired.fastq.gz}
   R2paired=${R2//.fastq.gz/_paired.fastq.gz}
   R2unpaired=${R2//.fastq.gz/_unpaired.fastq.gz}
echo "java -jar /c1/apps/trimmomatic/Trimmomatic-0.33/trimmomatic-0.33.jar PE
 -phred33 -trimlog Trimmed.txt $R1 $R2 $R1paired $R1unpaired $R2paired $R2unpaired
 ILLUMINACLIP:/c1/apps/trimmomatic/Trimmomatic-0.33/adapters/TruSeq3-PE.fa:2:30:10"
>> trimmomatic.cmds
done
rna-seq Trimmomatic • 4.1k views
ADD COMMENT
1
Entering edit mode
5.7 years ago
h.mon 35k

I see two errors on your loop:

1) I have no idea of what this line does, but apparently it just assigns the value stored in $R1 to $R2:

R2=${R1//R1_001_.fastq.gz/R2_001_.fastq.gz}

2) with echo you are not running anything, you are just echoing the text to standard output (stdout), and with >> you are redirecting and appending stdout to the file "trimmomatic.cmds" - you may inspect it with less trimmomatic.cmds and see how your trimmomatic would be, in case you remove the echo "".

Trimmomatic loops are popular... have a look at these posts for inspiration:

automated script to trim paired end RNA seq reads using Trimmomatic

Linux for loop for 2 inputs and 4 outputs for Trimmomatic fastq quality trimming

Trimmomatic job script to run on multiple pair end read file

Using trimmomatic on multiple paired-end read files

How to read a loop over to read a file in a folder for many folders

ADD COMMENT
0
Entering edit mode

Thank you h.mon. I'll take a look at the links.

ADD REPLY
0
Entering edit mode

Hi h.mon, Could you please take a look at my script and tell me where I am wrong?

module load trimmomatic

for f1 in *.fastq.gz
do
   f2=${f1%%1.fastq.gz}".fastq.gz"

   java -jar /<path to trimmomatic> PE -threads 16 \
     -phred33 -trimlog Trimlog \
    $f1 \
    $f2 \
    adapterRemoved_Paired.fastq.gz \
    adapterRemoved_Unpaired.fastq.gz \
    adapterRemoved_Paired.fastq.gz \
    adapterRemoved_Unpaired.fastq.gz \
    ILLUMINACLIP: <path to trimmomatic>trimmomatic/Trimmomatic-0.33/adapters/TruSeq3-PE.fa:2:30:10 \
    LEADING:3 TRAILING:3 MINLEN:36 \

done

This is the error message I get the error message below:

TrimmomaticPE: Started with arguments: -threads 16 -phred33 -trimlog Trimlog Control-1_R1_001.fastq.gz Control-1_R1_00.fastq.gz adapterRemoved_Paired.fastq.gz adapterRemoved_Unpaired.fastq.gz adapterRemoved_Paired.fastq.gz adapterRemoved_Unpaired.fastq.gz ILLUMINACLIP:/c1/apps/trimmomatic/Trimmomatic-0.33/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Exception in thread "main" java.io.FileNotFoundException: C-1_R2_00.fastq.gz  (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:127)
        at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:255)
        at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:499)
        at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:35)

Thank you

ADD REPLY
0
Entering edit mode

Are you editing the error message? The command line shows the input fastq names are Control-1_R1_001.fastq.gz Control-1_R1_00.fastq.gz, but the error message says:

C-1_R2_00.fastq.gz (No such file or directory)

Again, I see two errors in your loop:

1) you are globing both R1 and R2 filenames with for f1 in *.fastq.gz, you want something like:

for f1 in *_R1_001.fastq.gz

2) your manipulation of the value of $f1 here is wrong, hence you are passing the non-existing file Control-1_R1_00.fastq.gz, instead of the correct Control-1_R2_001.fastq.gz. You have to modify f2=${f1%%1.fastq.gz}".fastq.gz" to:

f2=${f1%%_R1_001.fastq.gz}"_R2_001.fastq.gz"
ADD REPLY
0
Entering edit mode

Thank you! Yes, I see where my errors are coming from now. Also, I did attempt to de-identify the sample by editing the name in the output file.Again, thanks a lt, you have been a ton if help.

ADD REPLY
0
Entering edit mode
5.7 years ago

You can try a script that I have written sometime back. Hosted on github

The script can be used for running trimmomatic automatically for N no.of samples regardless of the file extension. See this github link to understand the functionalities, usage and reasons for failure.

ADD COMMENT
0
Entering edit mode

Thank you, Vijay. I have looked at the links you provided, including the one about the functionalities.

My files are .fq.gz. Also, I have 9 PE files (total of 18). Does this make N=9 or N=18? Also based on your script, it should be run as sh auto_trim.sh *.fq.gz but I do not understand the following:

red=`tput setaf 1`
green=`tput setaf 2`
yellow=`tput setaf 3`
reset=`tput sgr0`
count=0

Also, is this the beginning of the script?

if [[ ( $1 == '-h' ) || ( $1 == '--help') ]] ;then
 usage
elif [[ $# -eq 0 ]] ;then
 echo "${red}Error: No parameter(s) provided${reset}"
 usage
 exit 0
else
ADD REPLY
0
Entering edit mode

My files are .fq.gz. Also, I have 9 PE files (total of 18). Does this make N=9 or N=18?

Here, N = 9. The script automatically considers paired end files (R1 and R2) based on file names

Also based on your script, it should be run as sh auto_trim.sh *.fq.gz but I do not understand the following: red=tput setaf 1 green=tput setaf 2 yellow=tput setaf 3 reset=tput sgr0 count=0

I was trying to add some visual appeal to the command line using tput command (may sound foolish). It has nothing to do regarding trimmomatic; it's just adding color to the help/error message fonts. See this

color

Also, is this the beginning of the script? if [[ ( $1 == '-h' ) || ( $1 == '--help') ]] ;then usage elif [[ $# -eq 0 ]] ;then echo "${red}Error: No parameter(s) provided${reset}" usage exit 0 else

That for invoking help message as you can see in the image above.

You just need to run the script in the folder having your fastq files like this

$ sh auto_trimmomatic.sh *.fq.gz

I am using this on a daily basis and it works as expected for me. You need to change the parameters for trimmomatic from within the script. Let me know if you face any issue.

ADD REPLY
0
Entering edit mode

Thanks. Quick question: what is the syntax for setting parameters, and is this for SE and PE?

ADD REPLY
0
Entering edit mode

This is PE

All the parameters are in this section of the code

PE \
-threads 10 \
-phred33 $R1 $R2 $R1_pair $R1_unpair $R2_pair $R2_unpair \
HEADCROP:7 \
ILLUMINACLIP:/data/results/adapter.fasta:2:30:8 \
LEADING:20 \
TRAILING:20 \
SLIDINGWINDOW:20:20 \
MINLEN:40

You need to simply delete the line for the param which is not required or edit the value within the code. For example, say you dont need to HEADCROP the fastq, so the modified version will be (notice that I deleted "HEADCROP" line) :

PE \
-threads 10 \
-phred33 $R1 $R2 $R1_pair $R1_unpair $R2_pair $R2_unpair \
ILLUMINACLIP:/data/results/adapter.fasta:2:30:8 \
LEADING:20 \
TRAILING:20 \
SLIDINGWINDOW:20:20 \
MINLEN:40

or say you need more threads (say 20 threads) to run trimmomatic, the modified version would be (NOTICE that I changed the threads from 10 to 20) :

PE \
-threads 20\
-phred33 $R1 $R2 $R1_pair $R1_unpair $R2_pair $R2_unpair \
HEADCROP:7 \
ILLUMINACLIP:/data/results/adapter.fasta:2:30:8 \
LEADING:20 \
TRAILING:20 \
SLIDINGWINDOW:20:20 \
MINLEN:40

Trimmomatic manual (page #2) says that the different processing steps occur in the order in which the steps are specified on the command line. command line

Changing within the script may not be a good way, but that's how it is for now. Give it a try.

ADD REPLY
1
Entering edit mode

Thanks a lot., I guess my question was not clear enough. I was wondering if parameters were included in your script.

ADD REPLY
1
Entering edit mode

My pleasure. Just like to mention that you should upvote or accept an answer if that helped. This not only encourages the person helping you but also help other people looking for solutions to similar problems.

enter image description here

ADD REPLY
0
Entering edit mode

Upvote done! Thanks again.

ADD REPLY
0
Entering edit mode

Hi @imesi

The post having the answer which worked should be upvoted or/and accepted as answer(you can do both). I this case if multiple posts helped, please upvote and accept as answer. In my case, that post is the one where I originally provided the link to the code.

ADD REPLY

Login before adding your answer.

Traffic: 2414 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6