Question: How to read a loop over to read a file in a folder for many folders
0
gravatar for tcf.hcdg
2.1 years ago by
tcf.hcdg60
European Union
tcf.hcdg60 wrote:

I have 15 folders and each folder contained a *.gz file. I would like to use trimmomatics for filtering all the files. For this I would like to write something that can open that folder and read the that specific file and then do the filtering as specified in the code and finally save the results within the same folder with different file extension. What I did is(PBS Script):

#!/bin/bash
#PBS -N Trimmomatics_filtering
#PBS -l nodes=1:ppn=8
#PBS -l walltime=04:00:00
#PBS -l vmem=23gb
#PBS -q ext_chem_guest

# Go to the Trimmomatics directory
 cd /home/tb44227/bioinfo_packages/Trimmomatic/Trimmomatic-0.36
    # Java module load
module load java/1.8.0-162

# Input File (I have a list of 15 folders and each contained fastq.gz file)
inputFile= for f in /home/tb44227/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/754_{1..15}/*fastq.gz; $f

# Start the code to filter the file  and save the results in the same folder where the input file is
   java -jar trimmomatic-0.36.jar SE  -threads ${PBS_NUM_PPN} -phred33 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:17 $inputFile  $outputFile

# Output File
outputFile=$inputFile{.TRIMMIMG}

My question is How could I define $inputFile and $outputfile so that it can read for all the 15 files. I tried this but it seems that imput and output definition are not correct.

Thanks

ADD COMMENTlink modified 2.1 years ago by shenwei3565.1k • written 2.1 years ago by tcf.hcdg60

Can you show us your file structure? I'm not sure I fully understand.

Are you saying you want to run the java command for every file in multiple directories, and have the output go back in to the directory the input file came from?

ADD REPLYlink written 2.1 years ago by Joe16k

It's normal fastq file tb44227@lido-gw02:~/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/754_1>zcat 754_1_ATCACG_L006_R1_001.fastq.gz | head

@SEQILMN03:400:CBEFMANXX:6:1101:1200:1889 1:N:0:NTCACG
NACTCGGATAACCGTAGTAATTCTAGAGCTAATACGTTGGAATTCTCGGG
+
#<BBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SEQILMN03:400:CBEFMANXX:6:1101:1463:1895 1:N:0:ATCACG
NCGGACCAGGCTTCATTCCCCTGGAATTCTCGGGTGCCAAGGAACTCCAG
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SEQILMN03:400:CBEFMANXX:6:1101:1272:1914 1:N:0:ATCACG
NCTGAGGCATCCTAACAGACCGGTAGACTTGAACTGGAATTCTCGGGTGC

Yes I would like to run jave over all the files in different directories. JAVA is running on individual file. It means theere is some Problem of defining/Looping over the Input files.

ADD REPLYlink written 2.1 years ago by tcf.hcdg60
2

Well one of the obivous issues with your code, is that you don't define your output file until after the java command.

ADD REPLYlink written 2.1 years ago by Joe16k
2
gravatar for Joe
2.1 years ago by
Joe16k
United Kingdom
Joe16k wrote:

Try something like this:

for dir in /home/tb44227/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/*/ ; do
     cd "$dir"
     for file in ./*.fastq.gz ; do
            java -jar trimmomatic-0.36.jar SE  -threads ${PBS_NUM_PPN} -phred33 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:17 "$file" "${file%.*}"_trimming.fastq
    done
   cd ../
done

This is untested, so the file paths etc may not be exactly right. I would test the "${dir}""${file%.*}"_trimming.fastq part in particular, and make sure the cd ../ returns to the correct directory, before you set this running on everything.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Joe16k
1

Thanks. It worked

for dir in /home/tb44227/nobackup/small_RNAseq_260917/support.igatech.it/sequences-export/536-RNA-seq_Disco_TuDO/delivery_25092017/*/ ; do
     cd "$dir"
     for file in ./*.fastq.gz ; do
            java -jar trimmomatic-0.36.jar SE  -threads ${PBS_NUM_PPN} -phred33 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:17 "$file" "${file%.*}"_trimming.fastq
    done
   cd ../
done

The last done was missing and "${dir}" was additional in the output file.

ADD REPLYlink written 2.1 years ago by tcf.hcdg60

Oh yes, I see there was a small error in my formatting, but fixed now.

ADD REPLYlink written 2.1 years ago by Joe16k
1
gravatar for shenwei356
2.1 years ago by
shenwei3565.1k
China
shenwei3565.1k wrote:

Try easy_qsub for easily submitting multiple PBS jobs. For a cluster, tt's better than submitting one job which handles multiple files.

easy_qsub 'echo {} > {}.out' dir/*.fq.gz
ADD COMMENTlink written 2.1 years ago by shenwei3565.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1481 users visited in the last hour