Exclude filename in Trimmomatics
0
0
Entering edit mode
3.7 years ago
storm1907 ▴ 30

Hello,

I have a problem with BGI RNA-Seq files. When doing Trimmomatics, my script stops exactly at sample 41 in each cell's first lane (V300042149_L01_41_1.fq.gz) in IlluminaClip step. Requesting larger and faster resources from server (more memory, ppn or new nodes with big memory) does not make sense, so I decided just to skip that file.

    #!/bin/bash -x
#PBS -N trimmomatics_job
#PBS -q batch
#PBS -l walltime=72:00:00
#PBS -l feature=summer
#PBS -l feature=largescratch
#PBS -l nodes=wn01:ppn=12;mem=60gb
#PBS -W x=naccesspolicy:UNIQUEUSER
#PBS -j oe
#PBS -A job

INPATH=/dir/dir/dir/subdir/subdir/subdir
OUTPATH=/dir/dir/dir/subdir/subdir/subdir
cd <path to Trimmomatics tool>

shopt -s nullglob

for dir in $INPATH{/,/*/} ;
do
    for file in $dir/*1.fq.gz ;
    do
        bname=$(basename $file '1.fq.gz')
        echo "file: "$file
        echo $bname
        input1=$dir/$bname"1.fq.gz"
        input2=$dir/$bname"2.fq.gz"
        output1=$OUTPATH/$bname"1.paired.fq.gz"
        output2=$OUTPATH/$bname"1.unpaired.fq.gz"
        output3=$OUTPATH/$bname"2.paired.fq.gz"
        output4=$OUTPATH/$bname"2.unpaired.fq.gz"
        echo $input1 $input2
        find . -type f -name V300042149_L01_41_1.fq.gz -prune  -o  -exec trimmomatic-0.39.jar {} \;
        java -jar trimmomatic-0.39.jar PE -threads 4 -phred33 \
            "$input1" "$input2" "$output1" "$output2" "$output3" "$output4" \
            ILLUMINACLIP:BGI_Adapters.fa:2:30:10 \
            LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
    done
done

I have no idea, ho to write this row for excluding the file:

find . -type f -name V300042149_L01_41_1.fq.gz -prune  -o  -exec trimmomatic-0.39.jar {} \;

thank you :)

RNA-Seq next-gen • 929 views
ADD COMMENT
1
Entering edit mode

Nested loops in shell are a bad, bad choice. Surely there must be a better way, maybe create a table of input args first and then run a command per line of that file?

You should use something like find and -not -name "*_41_*" to exclude the 41 files, but definitely look at creating a tabular file with input1, input2, output1, output2, output3, output4 as columns then execute the command per line in that file. You can use the find I suggested while creating this tabular file so the tabular file excludes the 41 files.

Or, your file could just contain the bname values and you could construct everything else on the fly. Just try and get rid of the loop - it is shell abuse.

ADD REPLY
0
Entering edit mode

Oh and by the way, it's trimmomatic, not "trimmomatics".

ADD REPLY
0
Entering edit mode

Another point: You'll almost never need to cd to a tool directory to run it. Only badly built tools work that way. Instead use something like

java -jar /path/to/trimmomatic/trimmomatic-0.39.jar ...

from your working directory so any files dumped in the working directory won't clog up the tool's source directory.

ADD REPLY

Login before adding your answer.

Traffic: 1487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6