How to loop over a directory to set each file as a variable?
0
0
Entering edit mode
4.9 years ago
DNAngel ▴ 250

Hi all,

I have a bash script that I can call and I supply command line arguments to feed specific files into it. It works fine, but I want to optimize it but looping over a new directory where each file is set as a new argument. For example my script below:

#!/bin/bash
REF=$1
NAME=$2

if [ -f $REF.bwt ]
then
     echo "$REF already indexed, skipping!"
else
     echo "Indexing $REF"
     bwa index $REF

#Use the indexed REF as input
for files in ../cleaned_files/*.txt
do
       if [ -f ${files%%.txt}_$NAME.sam ]
       then
              echo "$files already cleaned, skipping!"
       else
             echo "Running assembly for $files"
             bwa mem -B 2 -t 40 $REF ${files} > ${files%%.txt}_$NAME.sam
       fi
done

This script continues for a while with other steps however, I have a directory with 100s of new reference files that I want to call in my script. Normally in terminal I execute my script as:

 $ ./my_script.sh ref_1a.fa 1a_done

Where ref_1a.fa is a file I would input, and 1a_done is just a naming variable I use. My script can already be executed for each raw file in my cleaned_files directory, but I am unsure how to properly incorporate another for loop for my $REF files without screwing up my script (that took me forever to get because I am NOT great at scripting). I was thinking:

#!/bin/bash
REF=$1
NAME=$2

for ref_files in ../reference_files/*.fa
do
    if [ -f $REF.bwt ]
    then
         echo "$REF already indexed, skipping!"
    else
        echo "Indexing $REF"
        bwa index $REF

#Use the indexed REF as input
for files in ../cleaned_files/*.txt
do
       if [ -f ${files%%.txt}_$NAME.sam ]
       then
              echo "$files already cleaned, skipping!"
       else
             echo "Running assembly for $files"
             bwa mem -B 2 -t 40 $REF ${files} > ${files%%.txt}_$NAME.sam
       fi
done

With this second script, would it use the first indexed reference file from reference_files and then use it for all the raw data files in cleaned_files, or is it going to try indexing ALL my reference files first? I need to finish my_script.sh per reference file, and then repeat this whole script again for the second reference file. If anyone can take a look at it and just quickly tell me if my process is right before I load it up on the server (because it doesn't execute right away due to queues hence why I am hesitant at loading different versions of my script and loading up the queue)? Thank you all so much :)

bash • 1.8k views
ADD COMMENT
2
Entering edit mode

you are re-inventing the wheel. Use a workflow manager like nextflow https://www.nextflow.io/

ADD REPLY

Login before adding your answer.

Traffic: 1710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6