Trimmomatic output giving error message.
0
0
Entering edit mode
3.8 years ago

Hello, I am running Trimmomatic for trimming paired-end whole-genome sequence data in a loop. The code is the following:

 adutta$ cd /Users/adutta/Desktop/Pangenome_kmer_paper/Raw_seq
   adutta$  for infile in *_1.fastq.gz

            > do
            >   base=$(basename ${infile} _1.fastq.gz)
            >   trimmomatic PE ${infile} ${base}_2.fastq.gz \
            >                ${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz \
            >                ${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz \
            >                SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10 
            > done

However, I am obtaining output files but I get a message like the following:

Exception in thread "main" java.io.FileNotFoundException: base=_2.fastq.gz (No such file or directory)
    at java.base/java.io.FileInputStream.open0(Native Method)
    at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
    at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
    at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:135)
    at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:268)
    at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:555)
    at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:80)

How can I solve the problem? Please help.

SNP sequencing genome next-gen • 1.2k views
ADD COMMENT
0
Entering edit mode

Before running the trimmomatic command in the loop, just echo "$infile $base" within the loop and ensure all values being echod are valid files. There is probably an empty entry that's messing with the loop.

ADD REPLY
0
Entering edit mode

How can I do that? Like just type that command after the "for...." and before "do" command?

ADD REPLY
0
Entering edit mode

Are you copy-pasting code from somewhere? If so, please spend some time understanding the code. After you assign the value to the variable $base, just echo the values instead of the trimmomatic command. Once you're sure they're the right value, echo the entire command. And if the command looks right, run it.

Step-1

for infile in *_1.fastq.gz
do
  base=$(basename ${infile} _1.fastq.gz)
  echo "$infile $base"
done

Step-2

for infile in *_1.fastq.gz
do
  base=$(basename ${infile} _1.fastq.gz)
  echo "trimmomatic PE ${infile} ${base}_2.fastq.gz ${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz ${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10"
done

Step-3

for infile in *_1.fastq.gz
do
  base=$(basename ${infile} _1.fastq.gz)
  trimmomatic PE ${infile} ${base}_2.fastq.gz ${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz ${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
done
ADD REPLY
0
Entering edit mode

Hi, yes I am copying the script from somewhere. But they do not explain it. So after I run the first step as you showed, I get the following output:

Raw_seq adutta$ for infile in *_1.fastq.gz
> do
>   base=$(basename ${infile} _1.fastq.gz)
>   echo "$infile $base"
> done
SRR4907761_1.fastq.gz SRR4907761
SRR4907762_1.fastq.gz SRR4907762
SRR4907763_1.fastq.gz SRR4907763
SRR4907764_1.fastq.gz SRR4907764
SRR4907765_1.fastq.gz SRR4907765
SRR4907766_1.fastq.gz SRR4907766
base=_1.fastq.gz base=

Then, I run the second step, which gives me the following output:

Raw_seq adutta$ for infile in *_1.fastq.gz
> do
>   base=$(basename ${infile} _1.fastq.gz)
>   echo "trimmomatic PE ${infile} ${base}_2.fastq.gz ${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz ${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10"
> done
trimmomatic PE SRR4907761_1.fastq.gz SRR4907761_2.fastq.gz SRR4907761_1.trim.fastq.gz SRR4907761_1un.trim.fastq.gz SRR4907761_2.trim.fastq.gz SRR4907761_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE SRR4907762_1.fastq.gz SRR4907762_2.fastq.gz SRR4907762_1.trim.fastq.gz SRR4907762_1un.trim.fastq.gz SRR4907762_2.trim.fastq.gz SRR4907762_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE SRR4907763_1.fastq.gz SRR4907763_2.fastq.gz SRR4907763_1.trim.fastq.gz SRR4907763_1un.trim.fastq.gz SRR4907763_2.trim.fastq.gz SRR4907763_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE SRR4907764_1.fastq.gz SRR4907764_2.fastq.gz SRR4907764_1.trim.fastq.gz SRR4907764_1un.trim.fastq.gz SRR4907764_2.trim.fastq.gz SRR4907764_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE SRR4907765_1.fastq.gz SRR4907765_2.fastq.gz SRR4907765_1.trim.fastq.gz SRR4907765_1un.trim.fastq.gz SRR4907765_2.trim.fastq.gz SRR4907765_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE SRR4907766_1.fastq.gz SRR4907766_2.fastq.gz SRR4907766_1.trim.fastq.gz SRR4907766_1un.trim.fastq.gz SRR4907766_2.trim.fastq.gz SRR4907766_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
trimmomatic PE base=_1.fastq.gz base=_2.fastq.gz base=_1.trim.fastq.gz base=_1un.trim.fastq.gz base=_2.trim.fastq.gz base=_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10

But in the first step, I have other files with "_2.fastq.gz" which are not showing up after the command. Is that the problem?

ADD REPLY
1
Entering edit mode

Can you show us a simple listing of the files ls -lh *_1.fastq.gz? Looks like you have at least one bad file that doesn't appear to start with SRR*.

ADD REPLY
0
Entering edit mode

After running your command, I got the following output:

-rw-r--r--  1 adutta  staff   421M Jul 12 22:06 SRR4907761_1.fastq.gz
-rw-r--r--  1 adutta  staff   987M Jul 12 21:05 SRR4907762_1.fastq.gz
-rw-r--r--  1 adutta  staff   697M Jul 12 20:37 SRR4907763_1.fastq.gz
-rw-r--r--  1 adutta  staff   553M Jul 12 20:21 SRR4907764_1.fastq.gz
-rw-r--r--  1 adutta  staff   985M Jul 12 20:06 SRR4907765_1.fastq.gz
-rw-r--r--  1 adutta  staff   671M Jul 12 19:51 SRR4907766_1.fastq.gz
-rw-r--r--@ 1 adutta  staff     0B Jul 13 00:20 base=_1.fastq.gz

I realized that the file base=_1.fastq.gz was creating a problem. I ran the same script in the same folder again and again keeping that file. I guess that created the problem. When I removed that file and run the program again, it ran and I got the output. Any explanation of why that file caused the problem?

ADD REPLY
0
Entering edit mode

base=_1.fastq.gz was likely created when you were trying things out. As you can see it was an empty file.

As long as the rest of the data looks good you can go on to next step.

ADD REPLY
0
Entering edit mode

Thank you very much for helping me out.

ADD REPLY
0
Entering edit mode

You only specified one file trimmomatic PE ${infile} ${base}_2.fastq.gz while running PE algorithm. It is better to use something like trimmomatic PE ${base}_1.fastq.gz ${base}_2.fastq.gz. Edit: I guess former is ok. If you are getting the basename from a file then probably good to be explicit.

ADD REPLY
0
Entering edit mode

So should I write ${base}_1.fastq.gz before the ${base}_2.fastq.gz?

ADD REPLY
0
Entering edit mode

It may be more explicit to do that while removing infile. In any case use an echo command before trimmomatic to print out expected command lines and inspect them to make sure they look ok. Then remove echo to actually run them as @RamRS said.

ADD REPLY
0
Entering edit mode

Hi, so I added that command you suggested, but it still produces the same message. Nothing has changed. Any further suggestions, please?

ADD REPLY

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6