Question: How to read a loop over to read a file in a folder for many folders
0
gravatar for tcf.hcdg
9 months ago by
tcf.hcdg60
European Union
tcf.hcdg60 wrote:

I have 15 folders and each folder contained a *.gz file. I would like to use trimmomatics for filtering all the files. For this I would like to write something that can open that folder and read the that specific file and then do the filtering as specified in the code and finally save the results within the same folder with different file extension. What I did is(PBS Script):

#!/bin/bash
#PBS -N Trimmomatics_filtering
#PBS -l nodes=1:ppn=8
#PBS -l walltime=04:00:00
#PBS -l vmem=23gb
#PBS -q ext_chem_guest

# Go to the Trimmomatics directory
 cd /home/tb44227/bioinfo_packages/Trimmomatic/Trimmomatic-0.36
    # Java module load
module load java/1.8.0-162

# Input File (I have a list of 15 folders and each contained fastq.gz file)
inputFile= for f in /home/tb44227/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/754_{1..15}/*fastq.gz; $f

# Start the code to filter the file  and save the results in the same folder where the input file is
   java -jar trimmomatic-0.36.jar SE  -threads ${PBS_NUM_PPN} -phred33 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:17 $inputFile  $outputFile

# Output File
outputFile=$inputFile{.TRIMMIMG}

My question is How could I define $inputFile and $outputfile so that it can read for all the 15 files. I tried this but it seems that imput and output definition are not correct.

Thanks

ADD COMMENTlink modified 9 months ago by shenwei3564.3k • written 9 months ago by tcf.hcdg60

Can you show us your file structure? I'm not sure I fully understand.

Are you saying you want to run the java command for every file in multiple directories, and have the output go back in to the directory the input file came from?

ADD REPLYlink written 9 months ago by jrj.healey9.1k

It's normal fastq file tb44227@lido-gw02:~/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/754_1>zcat 754_1_ATCACG_L006_R1_001.fastq.gz | head

@SEQILMN03:400:CBEFMANXX:6:1101:1200:1889 1:N:0:NTCACG
NACTCGGATAACCGTAGTAATTCTAGAGCTAATACGTTGGAATTCTCGGG
+
#<BBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SEQILMN03:400:CBEFMANXX:6:1101:1463:1895 1:N:0:ATCACG
NCGGACCAGGCTTCATTCCCCTGGAATTCTCGGGTGCCAAGGAACTCCAG
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SEQILMN03:400:CBEFMANXX:6:1101:1272:1914 1:N:0:ATCACG
NCTGAGGCATCCTAACAGACCGGTAGACTTGAACTGGAATTCTCGGGTGC

Yes I would like to run jave over all the files in different directories. JAVA is running on individual file. It means theere is some Problem of defining/Looping over the Input files.

ADD REPLYlink written 9 months ago by tcf.hcdg60
2

Well one of the obivous issues with your code, is that you don't define your output file until after the java command.

ADD REPLYlink written 9 months ago by jrj.healey9.1k
2
gravatar for jrj.healey
9 months ago by
jrj.healey9.1k
United Kingdom
jrj.healey9.1k wrote:

Try something like this:

for dir in /home/tb44227/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/*/ ; do
     cd "$dir"
     for file in ./*.fastq.gz ; do
            java -jar trimmomatic-0.36.jar SE  -threads ${PBS_NUM_PPN} -phred33 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:17 "$file" "${file%.*}"_trimming.fastq
    done
   cd ../
done

This is untested, so the file paths etc may not be exactly right. I would test the "${dir}""${file%.*}"_trimming.fastq part in particular, and make sure the cd ../ returns to the correct directory, before you set this running on everything.

ADD COMMENTlink modified 9 months ago • written 9 months ago by jrj.healey9.1k
1

Thanks. It worked

for dir in /home/tb44227/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/*/ ; do
     cd "$dir"
     for file in ./*.fastq.gz ; do
            java -jar trimmomatic-0.36.jar SE  -threads ${PBS_NUM_PPN} -phred33 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:17 "$file" "${file%.*}"_trimming.fastq
    done
   cd ../
done

The last done was missing and "${dir}" was additional in the output file.

ADD REPLYlink written 9 months ago by tcf.hcdg60

Oh yes, I see there was a small error in my formatting, but fixed now.

ADD REPLYlink written 9 months ago by jrj.healey9.1k
1
gravatar for shenwei356
9 months ago by
shenwei3564.3k
China
shenwei3564.3k wrote:

Try easy_qsub for easily submitting multiple PBS jobs. For a cluster, tt's better than submitting one job which handles multiple files.

easy_qsub 'echo {} > {}.out' dir/*.fq.gz
ADD COMMENTlink written 9 months ago by shenwei3564.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2064 users visited in the last hour