How to loop over a directory to set each file as a variable?
0
0
Entering edit mode
3.1 years ago
DNAngel ▴ 240

Hi all,

I have a bash script that I can call and I supply command line arguments to feed specific files into it. It works fine, but I want to optimize it but looping over a new directory where each file is set as a new argument. For example my script below:

#!/bin/bash
REF=$1 NAME=$2

if [ -f $REF.bwt ] then echo "$REF already indexed, skipping!"
else
echo "Indexing $REF" bwa index$REF

#Use the indexed REF as input
for files in ../cleaned_files/*.txt
do
if [ -f ${files%%.txt}_$NAME.sam ]
then
echo "$files already cleaned, skipping!" else echo "Running assembly for$files"
bwa mem -B 2 -t 40 $REF${files} > ${files%%.txt}_$NAME.sam
fi
done


This script continues for a while with other steps however, I have a directory with 100s of new reference files that I want to call in my script. Normally in terminal I execute my script as:

 $./my_script.sh ref_1a.fa 1a_done  Where ref_1a.fa is a file I would input, and 1a_done is just a naming variable I use. My script can already be executed for each raw file in my cleaned_files directory, but I am unsure how to properly incorporate another for loop for my$REF files without screwing up my script (that took me forever to get because I am NOT great at scripting). I was thinking:

#!/bin/bash
REF=$1 NAME=$2

for ref_files in ../reference_files/*.fa
do
if [ -f $REF.bwt ] then echo "$REF already indexed, skipping!"
else
echo "Indexing $REF" bwa index$REF

#Use the indexed REF as input
for files in ../cleaned_files/*.txt
do
if [ -f ${files%%.txt}_$NAME.sam ]
then
echo "$files already cleaned, skipping!" else echo "Running assembly for$files"
bwa mem -B 2 -t 40 $REF${files} > ${files%%.txt}_$NAME.sam
fi
done


With this second script, would it use the first indexed reference file from reference_files and then use it for all the raw data files in cleaned_files, or is it going to try indexing ALL my reference files first? I need to finish my_script.sh per reference file, and then repeat this whole script again for the second reference file. If anyone can take a look at it and just quickly tell me if my process is right before I load it up on the server (because it doesn't execute right away due to queues hence why I am hesitant at loading different versions of my script and loading up the queue)? Thank you all so much :)

bash • 1.4k views
2
Entering edit mode

you are re-inventing the wheel. Use a workflow manager like nextflow https://www.nextflow.io/