How to read a loop over to read a file in a folder for many folders
2
0
Entering edit mode
6.1 years ago
tcf.hcdg ▴ 70

I have 15 folders and each folder contained a *.gz file. I would like to use trimmomatics for filtering all the files. For this I would like to write something that can open that folder and read the that specific file and then do the filtering as specified in the code and finally save the results within the same folder with different file extension. What I did is(PBS Script):

#!/bin/bash
#PBS -N Trimmomatics_filtering
#PBS -l nodes=1:ppn=8
#PBS -l walltime=04:00:00
#PBS -l vmem=23gb
#PBS -q ext_chem_guest

# Go to the Trimmomatics directory
 cd /home/tb44227/bioinfo_packages/Trimmomatic/Trimmomatic-0.36
    # Java module load
module load java/1.8.0-162

# Input File (I have a list of 15 folders and each contained fastq.gz file)
inputFile= for f in /home/tb44227/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/754_{1..15}/*fastq.gz; $f

# Start the code to filter the file  and save the results in the same folder where the input file is
   java -jar trimmomatic-0.36.jar SE  -threads ${PBS_NUM_PPN} -phred33 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:17 $inputFile  $outputFile

# Output File
outputFile=$inputFile{.TRIMMIMG}

My question is How could I define $inputFile and $outputfile so that it can read for all the 15 files. I tried this but it seems that imput and output definition are not correct.

Thanks

trimmomatic filtering multiple files • 2.3k views
ADD COMMENT
0
Entering edit mode

Can you show us your file structure? I'm not sure I fully understand.

Are you saying you want to run the java command for every file in multiple directories, and have the output go back in to the directory the input file came from?

ADD REPLY
0
Entering edit mode

It's normal fastq file tb44227@lido-gw02:~/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/754_1>zcat 754_1_ATCACG_L006_R1_001.fastq.gz | head

@SEQILMN03:400:CBEFMANXX:6:1101:1200:1889 1:N:0:NTCACG
NACTCGGATAACCGTAGTAATTCTAGAGCTAATACGTTGGAATTCTCGGG
+
#<BBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SEQILMN03:400:CBEFMANXX:6:1101:1463:1895 1:N:0:ATCACG
NCGGACCAGGCTTCATTCCCCTGGAATTCTCGGGTGCCAAGGAACTCCAG
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SEQILMN03:400:CBEFMANXX:6:1101:1272:1914 1:N:0:ATCACG
NCTGAGGCATCCTAACAGACCGGTAGACTTGAACTGGAATTCTCGGGTGC

Yes I would like to run jave over all the files in different directories. JAVA is running on individual file. It means theere is some Problem of defining/Looping over the Input files.

ADD REPLY
2
Entering edit mode

Well one of the obivous issues with your code, is that you don't define your output file until after the java command.

ADD REPLY
2
Entering edit mode
6.1 years ago
Joe 21k

Try something like this:

for dir in /home/tb44227/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/*/ ; do
     cd "$dir"
     for file in ./*.fastq.gz ; do
            java -jar trimmomatic-0.36.jar SE  -threads ${PBS_NUM_PPN} -phred33 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:17 "$file" "${file%.*}"_trimming.fastq
    done
   cd ../
done

This is untested, so the file paths etc may not be exactly right. I would test the "${dir}""${file%.*}"_trimming.fastq part in particular, and make sure the cd ../ returns to the correct directory, before you set this running on everything.

ADD COMMENT
1
Entering edit mode

Thanks. It worked

for dir in /home/tb44227/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/*/ ; do
     cd "$dir"
     for file in ./*.fastq.gz ; do
            java -jar trimmomatic-0.36.jar SE  -threads ${PBS_NUM_PPN} -phred33 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:17 "$file" "${file%.*}"_trimming.fastq
    done
   cd ../
done

The last done was missing and "${dir}" was additional in the output file.

ADD REPLY
0
Entering edit mode

Oh yes, I see there was a small error in my formatting, but fixed now.

ADD REPLY
1
Entering edit mode
6.1 years ago

Try easy_qsub for easily submitting multiple PBS jobs. For a cluster, tt's better than submitting one job which handles multiple files.

easy_qsub 'echo {} > {}.out' dir/*.fq.gz
ADD COMMENT

Login before adding your answer.

Traffic: 1662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6