Entering edit mode
3.8 years ago
anikcropscience
▴
230
Hello, I am running Trimmomatic for trimming paired-end whole-genome sequence data in a loop. The code is the following:
adutta$ cd /Users/adutta/Desktop/Pangenome_kmer_paper/Raw_seq
adutta$ for infile in *_1.fastq.gz
> do
> base=$(basename ${infile} _1.fastq.gz)
> trimmomatic PE ${infile} ${base}_2.fastq.gz \
> ${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz \
> ${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz \
> SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:10 TRAILING:10
> done
However, I am obtaining output files but I get a message like the following:
Exception in thread "main" java.io.FileNotFoundException: base=_2.fastq.gz (No such file or directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:135)
at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:268)
at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:555)
at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:80)
How can I solve the problem? Please help.
Before running the trimmomatic command in the loop, just
echo "$infile $base"
within the loop and ensure all values beingecho
d are valid files. There is probably an empty entry that's messing with the loop.How can I do that? Like just type that command after the "for...." and before "do" command?
Are you copy-pasting code from somewhere? If so, please spend some time understanding the code. After you assign the value to the variable
$base
, justecho
the values instead of thetrimmomatic
command. Once you're sure they're the right value, echo the entire command. And if the command looks right, run it.Step-1
Step-2
Step-3
Hi, yes I am copying the script from somewhere. But they do not explain it. So after I run the first step as you showed, I get the following output:
Then, I run the second step, which gives me the following output:
But in the first step, I have other files with "_2.fastq.gz" which are not showing up after the command. Is that the problem?
Can you show us a simple listing of the files
ls -lh *_1.fastq.gz
? Looks like you have at least one bad file that doesn't appear to start withSRR*
.After running your command, I got the following output:
I realized that the file
base=_1.fastq.gz
was creating a problem. I ran the same script in the same folder again and again keeping that file. I guess that created the problem. When I removed that file and run the program again, it ran and I got the output. Any explanation of why that file caused the problem?base=_1.fastq.gz
was likely created when you were trying things out. As you can see it was an empty file.As long as the rest of the data looks good you can go on to next step.
Thank you very much for helping me out.
You only specified one fileIt is better to use something liketrimmomatic PE ${infile} ${base}_2.fastq.gz
while runningPE
algorithm.trimmomatic PE ${base}_1.fastq.gz ${base}_2.fastq.gz
. Edit: I guess former is ok. If you are getting the basename from a file then probably good to be explicit.So should I write
${base}_1.fastq.gz
before the${base}_2.fastq.gz
?It may be more explicit to do that while removing
infile
. In any case use anecho
command beforetrimmomatic
to print out expected command lines and inspect them to make sure they look ok. Then removeecho
to actually run them as @RamRS said.Hi, so I added that command you suggested, but it still produces the same message. Nothing has changed. Any further suggestions, please?