Hello, having issue with Paired End Trimming. It looks like my script does not recognize input files, and I miss any idea, what is wrong.
#!/bin/bash
#PBS -N trimmomatics_job
#PBS -q batch
#PBS -l walltime=72:00:00
#PBS -l nodes=1:ppn=40,mem=40gb
#PBS -W x=naccesspolicy:UNIQUEUSER
#PBS -j oe
#PBS -A job
module load java
INPATH1=/home/groups/dir/subdir
OUTPATH=/home/groups/dir/subdir
cd $INPATH1
for dir in */;
do
for file1 in $dir/*.fq.gz;
do
bname1=$(basename $file1 '.fq.gz')
sample1="$( cut -d'_' -f 1,2,3<<<"$bname1")"
read1="$( cut -d'_' -f 4 <<<"$bname1")"
for file2 in $dir/*.fq.gz;
do
bname2=$(basename $file2 '.fq.gz')
sample2="$( cut -d'_' -f 1,2,3<<<"$bname2")"
read2="$( cut -d'_' -f 4 <<<"$bname2")"
if [ "$sample1" == "$sample2" ] && [ "$read1" != "$read2" ] \
&& [ "$read1" == 1 ] ;
then
echo "$sample2" "$sample2" "$read1" "$read2"
input1=$INPATH1/$bname1.fq.gz
input2=$INPATH2/$bname2.fq.gz
output1=$OUTPATH/$bname1.paired.fq.gz
output2=$OUTPATH/$bname1.unpaired.fq.gz
output3=$OUTPATH/$bname2.paired.fq.gz
output4=$OUTPATH/$bname2.unpaired.fq.gz
echo "$input1" "$input2"
cd /home/usr/tools/Trimmomatic-0.39
java -jar trimmomatic-0.39.jar PE -phred33 \
"$input1" "$input2" "$output1" "$output2" "$output3" "$output4" \
ILLUMINACLIP:BGI_Adapters.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
fi
done
done
done
however, I keep getting error message like this
Exception in thread "main" java.io.FileNotFoundException: /dir/subdir/_L01_100_1.fq.gz (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:135)
at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:265)
at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:555)
at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:80)
Error is clear:
Looking at your script you do need to change the values for these two variables to match real folders you have on your server.
Well, I especially changed the names of folders (data confidentiality and stuff like that :D )
I see. The first part suspiciously matched the error you showed so just wanted to be certain.
Next thing to check is your file names. Are they consistent? Do they end in
.fq.gzand have_L01_100_1in name? That naming is a bit odd since files generally should have_L001_001.fq.gzin their names. Put a number ofechocommands in your script and see what is produced at each step to debug the issue.these are BGI RNA Seq files, for example, V40002080_L01_109_1.fq.gz or V40002080_L01_109_2.fq.gz
So some combination of
dir/subdirand_L01_100_1.fq.gzis not being found, because the sample name is not getting reconstituted properly.Would it not be better to submit multiple jobs inside
forloop instead of submitting a single job like this?I was thinking about it, but how can i put two variables (input1 and 2) into one loop?