For Loop Is Not Working Within My Script
1
1
Entering edit mode
11.2 years ago

Hi all

I made a script in bash that run blastx.

#!/bin/sh -l 
# $Id: Barrine.sh federicogaiti $

echo "Right now it is:"
date
echo ""
function usage() {
echo "Barrine.sh script by Federico Gaiti, March 2013."
echo ""
echo "Run Blastx"
echo ""
echo "Usage: "
echo "Barrine.sh <All contigs (fasta extension)> <Number of fasta split sequences> <Head file with PBS command where you set up account, walltime, etc..> "
echo ""
echo "Example: "
echo "Barrine.sh InputFasta n Head "
echo ""
exit 1
}

# Testing if the number of arguments is correct
if [ $# != 3 ]
then
    usage
        exit
    fi

### Declaring variables
InputFasta=$1    MY ORIGINAL FASTA FILE WITH ALL THE SEQUENCES
n=$2                 NUMBER OF FILES I WANT MY FASTA FILE SPLIT
Head=$3           PBS OPTIONS
n4=$((n/4))
n41=$((n/4 + 1))
n2=$((n/2))
n21=$((n/2 + 1))
n43=$((n/4 * 3))
n431=$((n/4*3 + 1))

echo Loading Modules
module load ucsc_utilities/20130122
wait

echo Split input FASTA in n FASTA file
faSplit sequence ${InputFasta} ${n} ${InputFasta}_Split_S 
wait

echo blastx command splitting it in 4 jobs to make it faster

for i in {000..0$n4}
do
    echo "blastx -query ${InputFasta}_Split_S$i.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out ${InputFasta}_blastx$i.csv"
done > ${InputFasta}_Job1.sh

for i in {0$n41..$n2}
do
    echo "blastx -query ${InputFasta}_Split_S$i.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out ${InputFasta}_blastx$i.csv"
done > ${InputFasta}_Job2.sh

for i in {$n21..$n43}
do
    echo "blastx -query ${InputFasta}_Split_S$i.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out ${InputFasta}_blastx$i.csv"
done > ${InputFasta}_Job3.sh

for i in {$n431..$n}
do
    echo "blastx -query ${InputFasta}_Split_S$i.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out ${InputFasta}_blastx$i.csv"
done > ${InputFasta}_Job4.sh

wait

echo Head the PBS commands to the Job files 

for i in {1..4} 
do
    cat ${Head} ${InputFasta}_Job$i.sh | sed "s/BlastJob/BlastJob_$i/g" > ${InputFasta}_BlastJob$i.sh 
done

I will then submit the 4 Jobs to barrine server. What I am supposed to obtain is 4 jobs containing this:

blastx -query TEST_Split_S000.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out TEST_blastx000.csv
blastx -query TEST_Split_S001.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out TEST_blastx001.csv
blastx -query TEST_Split_S002.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out TEST_blastx002.csv
blastx -query TEST_Split_S003.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out TEST_blastx003.csv
blastx -query TEST_Split_S004.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out TEST_blastx004.csv
blastx -query TEST_Split_S005.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out TEST_blastx005.csv
blastx -query TEST_Split_S006.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out TEST_blastx006.csv
blastx -query TEST_Split_S007.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out TEST_blastx007.csv
............
blastx -query TEST_Split_S050.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out TEST_blastx050.csv

But what I got in my job file is just:

blastx -query TEST_Split_S{000..050}.fa -evalue 0.0001 -max_target_seqs 100 -db nr -num_threads 8 -outfmt '7 qseqid qlen sseqid pident length mismatch gapopen qstart qend sstart send ppos evalue bitscore score' -out TEST_blastx{000..050}.csv

Someone can help me in solving the problem? If I use the for loop outside my script works fine.

Thanks for help

fasta ucsc • 2.9k views
ADD COMMENT
6
Entering edit mode
11.2 years ago

Funny, I work at UQ as well!

I think your problem is that Bash has different range-commands for when you have variables and when you have no variables:

{1..4}

works, but

{1..$end}

doesn't. For that you have to use

$(seq 1 $end)

Edit: By the way, GNU Parallel is a great tool to run a lot of BLAST-jobs on several CPUs on one file that automatically splits up the sequences so you don't have to write long bash-scripts: Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them It's not installed on Barrine yet.

ADD COMMENT
0
Entering edit mode

Very funny! Thanks for the tips...I'll try $(seq 1 $end) and see if it works...and I'll surely read about GNU Parallel Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6