Question: Trimmomatic output giving error message.
0
gravatar for anikcropscience
29 days ago by
anikcropscience30 wrote:

Hello, I am running Trimmomatic for trimming paired-end whole-genome sequence data in a loop. The code is the following:

 adutta$ cd /Users/adutta/Desktop/Pangenome_kmer_paper/Raw_seq
   adutta$  for infile in *_1.fastq.gz

            > do
            >   base=$(basename ${infile} _1.fastq.gz)
            >   trimmomatic PE ${infile} ${base}_2.fastq.gz \
            >                ${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz \
            >                ${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz \
            >                SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10 
            > done

However, I am obtaining output files but I get a message like the following:

Exception in thread "main" java.io.FileNotFoundException: base=_2.fastq.gz (No such file or directory)
    at java.base/java.io.FileInputStream.open0(Native Method)
    at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
    at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
    at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:135)
    at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:268)
    at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:555)
    at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:80)

How can I solve the problem? Please help.

sequencing snp next-gen genome • 90 views
ADD COMMENTlink modified 29 days ago • written 29 days ago by anikcropscience30

Before running the trimmomatic command in the loop, just echo "$infile $base" within the loop and ensure all values being echod are valid files. There is probably an empty entry that's messing with the loop.

ADD REPLYlink written 29 days ago by RamRS28k

How can I do that? Like just type that command after the "for...." and before "do" command?

ADD REPLYlink written 29 days ago by anikcropscience30

Are you copy-pasting code from somewhere? If so, please spend some time understanding the code. After you assign the value to the variable $base, just echo the values instead of the trimmomatic command. Once you're sure they're the right value, echo the entire command. And if the command looks right, run it.

Step-1

for infile in *_1.fastq.gz
do
  base=$(basename ${infile} _1.fastq.gz)
  echo "$infile $base"
done

Step-2

for infile in *_1.fastq.gz
do
  base=$(basename ${infile} _1.fastq.gz)
  echo "trimmomatic PE ${infile} ${base}_2.fastq.gz ${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz ${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10"
done

Step-3

for infile in *_1.fastq.gz
do
  base=$(basename ${infile} _1.fastq.gz)
  trimmomatic PE ${infile} ${base}_2.fastq.gz ${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz ${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
done
ADD REPLYlink modified 29 days ago • written 29 days ago by RamRS28k

Hi, yes I am copying the script from somewhere. But they do not explain it. So after I run the first step as you showed, I get the following output:

Raw_seq adutta$ for infile in *_1.fastq.gz
> do
>   base=$(basename ${infile} _1.fastq.gz)
>   echo "$infile $base"
> done
SRR4907761_1.fastq.gz SRR4907761
SRR4907762_1.fastq.gz SRR4907762
SRR4907763_1.fastq.gz SRR4907763
SRR4907764_1.fastq.gz SRR4907764
SRR4907765_1.fastq.gz SRR4907765
SRR4907766_1.fastq.gz SRR4907766
base=_1.fastq.gz base=

Then, I run the second step, which gives me the following output:

Raw_seq adutta$ for infile in *_1.fastq.gz
> do
>   base=$(basename ${infile} _1.fastq.gz)
>   echo "trimmomatic PE ${infile} ${base}_2.fastq.gz ${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz ${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10"
> done
trimmomatic PE SRR4907761_1.fastq.gz SRR4907761_2.fastq.gz SRR4907761_1.trim.fastq.gz SRR4907761_1un.trim.fastq.gz SRR4907761_2.trim.fastq.gz SRR4907761_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE SRR4907762_1.fastq.gz SRR4907762_2.fastq.gz SRR4907762_1.trim.fastq.gz SRR4907762_1un.trim.fastq.gz SRR4907762_2.trim.fastq.gz SRR4907762_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE SRR4907763_1.fastq.gz SRR4907763_2.fastq.gz SRR4907763_1.trim.fastq.gz SRR4907763_1un.trim.fastq.gz SRR4907763_2.trim.fastq.gz SRR4907763_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE SRR4907764_1.fastq.gz SRR4907764_2.fastq.gz SRR4907764_1.trim.fastq.gz SRR4907764_1un.trim.fastq.gz SRR4907764_2.trim.fastq.gz SRR4907764_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE SRR4907765_1.fastq.gz SRR4907765_2.fastq.gz SRR4907765_1.trim.fastq.gz SRR4907765_1un.trim.fastq.gz SRR4907765_2.trim.fastq.gz SRR4907765_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE SRR4907766_1.fastq.gz SRR4907766_2.fastq.gz SRR4907766_1.trim.fastq.gz SRR4907766_1un.trim.fastq.gz SRR4907766_2.trim.fastq.gz SRR4907766_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE base=_1.fastq.gz base=_2.fastq.gz base=_1.trim.fastq.gz base=_1un.trim.fastq.gz base=_2.trim.fastq.gz base=_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10

But in the first step, I have other files with "_2.fastq.gz" which are not showing up after the command. Is that the problem?

ADD REPLYlink written 29 days ago by anikcropscience30
1

Can you show us a simple listing of the files ls -lh *_1.fastq.gz? Looks like you have at least one bad file that doesn't appear to start with SRR*.

ADD REPLYlink written 29 days ago by genomax87k

After running your command, I got the following output:

-rw-r--r--  1 adutta  staff   421M Jul 12 22:06 SRR4907761_1.fastq.gz
-rw-r--r--  1 adutta  staff   987M Jul 12 21:05 SRR4907762_1.fastq.gz
-rw-r--r--  1 adutta  staff   697M Jul 12 20:37 SRR4907763_1.fastq.gz
-rw-r--r--  1 adutta  staff   553M Jul 12 20:21 SRR4907764_1.fastq.gz
-rw-r--r--  1 adutta  staff   985M Jul 12 20:06 SRR4907765_1.fastq.gz
-rw-r--r--  1 adutta  staff   671M Jul 12 19:51 SRR4907766_1.fastq.gz
-rw-r--r--@ 1 adutta  staff     0B Jul 13 00:20 base=_1.fastq.gz

I realized that the file base=_1.fastq.gz was creating a problem. I ran the same script in the same folder again and again keeping that file. I guess that created the problem. When I removed that file and run the program again, it ran and I got the output. Any explanation of why that file caused the problem?

ADD REPLYlink written 29 days ago by anikcropscience30

base=_1.fastq.gz was likely created when you were trying things out. As you can see it was an empty file.

As long as the rest of the data looks good you can go on to next step.

ADD REPLYlink written 29 days ago by genomax87k

Thank you very much for helping me out.

ADD REPLYlink written 29 days ago by anikcropscience30

You only specified one file trimmomatic PE ${infile} ${base}_2.fastq.gz while running PE algorithm. It is better to use something like trimmomatic PE ${base}_1.fastq.gz ${base}_2.fastq.gz. Edit: I guess former is ok. If you are getting the basename from a file then probably good to be explicit.

ADD REPLYlink modified 29 days ago • written 29 days ago by genomax87k

So should I write ${base}_1.fastq.gz before the ${base}_2.fastq.gz?

ADD REPLYlink written 29 days ago by anikcropscience30

It may be more explicit to do that while removing infile. In any case use an echo command before trimmomatic to print out expected command lines and inspect them to make sure they look ok. Then remove echo to actually run them as @RamRS said.

ADD REPLYlink written 29 days ago by genomax87k

Hi, so I added that command you suggested, but it still produces the same message. Nothing has changed. Any further suggestions, please?

ADD REPLYlink written 29 days ago by anikcropscience30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1612 users visited in the last hour