Question: Using trimmomatic on multiple paired-end read files
0
gravatar for bioinformaticssrm2011
2.3 years ago by
India
bioinformaticssrm201180 wrote:

I need help to write a for loop to run Trimmomatic tool for quality trimming of paired end fastq files. I need to write a for loop so that I can run an executable for all multiple files.

Input PE files looks like - C1_R1.fastq

C1_R2.fastq

C2_R1.fastq

C2_R2.fastq

C3_R1.fastq

C3_R2.fastq

T1_R1.fastq

T1_R2.fastq

T2_R1.fastq

T2_R2.fastq

T3_R1.fastq

T3_R2.fastq

To run trimmomatic for the paired reads corresponding to C1_R1.fastq and C1_R2.fastq, the following command works:

java -jar ~/Trimmomatic-0.36/trimmomatic-0.36.jar PE -phred33 C1_R1.fastq C1_R2.fastq C1_R1_paired.fastq C1_R1_unpaired.fastq C1_R2_paired.fastq C1_R2_unpaired.fastq LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:35

I have tried using this thread, but I am unable to understand.

Any help please! Thanks!

ADD COMMENTlink modified 5 weeks ago by Vijay Lakhujani4.1k • written 2.3 years ago by bioinformaticssrm201180
1

Hi bioinformaticssrm2011,

The second command I posted on seqanswers (with an echo in front) should show you the command how it would be executed. That way you can figure out what is wrong.

Shouldn't you use this command in paired end mode?

java -jar <path to trimmomatic.jar> PE [-threads <threads] [-phred33 | -phred64] [-trimlog <logFile>] >] [-basein inputBase> | <input 1> <input 2>] [-baseout <outputBase> | <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by WouterDeCoster39k

I got an error, if you see that thread

ADD REPLYlink written 2.3 years ago by bioinformaticssrm201180

I've seen that error. Now read what I wrote about using the echo statement and figuring out what's wrong.

ADD REPLYlink written 2.3 years ago by WouterDeCoster39k

About echo, I am new to this, so unable to understand it. Sorry

ADD REPLYlink written 2.3 years ago by bioinformaticssrm201180

If you don't know the echo command you need to start with following a command line tutorial, that will make everything less painful.

ADD REPLYlink written 2.3 years ago by WouterDeCoster39k

this is my python script

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import subprocess

with open('/data2/masw_data/Nanda/5A_and_5D/bbmap_config.txt', 'r') as f:
    for line in f:
        line = line.strip().split()
        ref, in1, in2, outm = line
        proc = subprocess.Popen(['/data1/masw/bbmap/bbmap.sh', ref, in1, in2, outm, 'threads=20', 'minid=0.70', 'nodisk'], shell=False)
        proc.wait()
f.writelines()
ADD REPLYlink modified 2.3 years ago by genomax68k • written 2.3 years ago by shengweima60

shengweima : Did you post in the wrong thread or were you presenting an example?

ADD REPLYlink written 2.3 years ago by genomax68k
1
gravatar for agata88
2.3 years ago by
agata88790
Poland
agata88790 wrote:

You need to specify the directories to your input/output files :

"inputdirectory" directory "processed" directory (as output) "log" directory for log information "program" directory with Trimmomatic jar file

#! python
import os
import sys
import subprocess 

for fileR1 in os.listdir(inputdirectory): 
    dividing = fileR1.split(".")
    if ("R1" in fileR1) :
        fileR2 = fileR1.replace('R1', 'R2')
        if os.path.isfile(inputdirectory + fileR2) :
            dividing1 = fileR2.split(".")
            log1 = dividing[0]
            output1 = dividing[0]
            output2 = dividing1[0]
            subprocess.call("java -jar " + program + "trimmomatic-0.35.jar PE -threads 12 -phred33 " + inputdirectory + fileR1 + " " + inputdirectory + fileR2 + " " + processed + output1 +"_trimmed.fastq.gz " +  "output_forward_unpaired.fq.gz "  + processed + output2 + "_trimmed.fastq.gz " + "output_reverse_unpaired.fq.gz " + " LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:35" +" >" + log + log1 + "_trimmomatic.txt" + " 2>&1", shell=True)

Hope it helps :) Best,

Agata

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by agata88790
1

hi, When I was trying to use above script, I got an error.

    #! python
    import os
    import sys
    import subprocess 
    inputdirectory = "/home/shashank/Desktop/patel_chest_NSC/inputdirectory"
   processed = "/home/shashank/Desktop/patel_chest_NSC/processed"
   log="/home/shashank/Desktop/patel_chest_NSC/log"
   for fileR1 in os.listdir(inputdirectory): 
        dividing = fileR1.split(".")
        if ("R1" in fileR1) :
            fileR2 = fileR1.replace('R1', 'R2')
            if os.path.isfile(inputdirectory + plikR2) :
                dividing1 = fileR2.split(".")
                log1 = dividing[0]
                output1 = dividing[0]
                output2 = dividing1[0]
                subprocess.call("java -jar " + program + "~/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 12 -phred33 " + inputdirectory + fileR1 + " " + inputdirectory + fileR2 + " " + processed + output1 +"_trimmed.fastq.gz " +  "output_forward_unpaired.fq.gz "  + processed + output2 + "_trimmed.fastq.gz " + "output_reverse_unpaired.fq.gz " + " LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:35" +" >" + log + log1 + "_trimmomatic.txt" + " 2>&1", shell=True)

Error is-

   Traceback (most recent call last):
  File "python.py", line 10, in <module>
    if os.path.isfile(inputdirectory + plikR2) :
NameError: name 'plikR2' is not defined
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by bioinformaticssrm201180

Please replace 'plikR2' to 'fileR2'.

ADD REPLYlink written 2.3 years ago by agata88790

Hi, thanks for your help. But I got an error

File "python.py", line 8 if ("R1" in fileR1): ^ IndentationError: unindent does not match any outer indentation level

than I checked the indent and tried to solve it, and later I got another error-

Traceback (most recent call last): File "python.py", line 6, in <module> for fileR1 in os.listdir(input): TypeError: coercing to Unicode: need string or buffer, builtin_function_or_method found

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by bioinformaticssrm201180
1

input shouldn't be used as a variable name in python3. Make it inputfile or something like that.

ADD REPLYlink written 2.3 years ago by WouterDeCoster39k

Thats helpful.

but another error pop up-

Traceback (most recent call last): File "python.py", line 6, in <module> for fileR1 in os.listdir(inputfile): NameError: name 'inputfile' is not defined

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by bioinformaticssrm201180

That's because you didn't define the variable.

ADD REPLYlink written 2.3 years ago by WouterDeCoster39k

Yes, that is correct it shouldn’t be 'input' , I forgot about that when I was changing names... sorry ... going to change that right now.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by agata88790

Changed, now it should be ok, also the indentation was repaired.

ADD REPLYlink written 2.3 years ago by agata88790

Hi, Thanks for you python script. I am trying to run this script but I have it not running or giving me any output. and also, I have no error come at any step. when I run it the result is nothing. Can you help me, please! Thanks

ADD REPLYlink written 24 months ago by Safa.A0

Hi, check the indentations. maybe during copy/paste something was shifted, best, Agata

ADD REPLYlink written 24 months ago by agata88790
1
gravatar for Vijay Lakhujani
5 weeks ago by
Vijay Lakhujani4.1k
India
Vijay Lakhujani4.1k wrote:

ADD COMMENTlink written 5 weeks ago by Vijay Lakhujani4.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 676 users visited in the last hour