Question: Blast script over multiple databases
0
gravatar for dllopezr
19 months ago by
dllopezr60
dllopezr60 wrote:

Hi community

I have a folder with multiple blast databases. I want to run blastn over all databases and produce one output for each database.

I'm trying something like that

for i in `find . -name 'name_of_database'`; do

time blastn -db "$i" -query Sondas_100.fasta -out "$i".out  -outfmt 7 -num_threads 16 -dust yes -ungapped

done

But this options search for a filename, and the blast databases are alias

Any help with that?

Thank you so much

ADD COMMENTlink modified 19 months ago • written 19 months ago by dllopezr60
1

Do you need an output per DB or will one output over all DBs do it as well?

What exactly do you mean with 'aliases'?

ADD REPLYlink modified 19 months ago • written 19 months ago by lieven.sterck8.7k

Hi Lieven

When I say "aliases" I refer to the name of the blast database is not a file but a name that represents the files.

Example: The makeblastdb produces 3 files with names: T1P1T0.nhr, T1P1T0.nsq and T1P1T0.nal, but the name of the blast database to pass to blastn script is only T1P1T0 without its extensions.

And yeah! I want a otuput for each database

ADD REPLYlink written 19 months ago by dllopezr60
0
gravatar for dllopezr
19 months ago by
dllopezr60
dllopezr60 wrote:

I already do it

Because my databases have this name structure: T"x"P"x"_T"x" when x is a 1 to 4 number

I create all the strings and passed it in the blast command

 #!/bin/bash

    for T in `seq 1 4`; do
            for P in `seq 1 4`; do
                    for t in `seq 0 3`; do
                            time blastn -db /vault2/homehpc/jmalagont/dllopezr/Shotgun_Seq/Trimmed_Seqs/FastaSeqs12/$"T"$T"P"$P"_T"$t"_R1" -query Sondas_100.fasta -out ""T"$T"P"$P"_T"$t"_R2"".out  -outfmt 7 -num_threads 16 -dust yes -ungapped
    done
    done
    done
ADD COMMENTlink written 19 months ago by dllopezr60
1

yes this will work (and congrats for solving it), but do consider using jrj.healey approach as that one is much more omni-applicable!

ADD REPLYlink written 19 months ago by lieven.sterck8.7k
2
gravatar for Joe
19 months ago by
Joe18k
United Kingdom
Joe18k wrote:

All you need to do is strip the extension off the result of your find command:

e.g.

for i in $(find . -name 'name_of_database.nhr') ; do
  database="${i%.*}"
  time blastn -db "$database" -query Sondas_100.fasta -out "$i".out  -outfmt 7 -num_threads 16 -dust yes -ungapped
done

Add the extension in the actual find command to ensure it only finds each database once, rather than once per related file, then strip the extension off, and pass the new path which should correspond to the basename of the database.

*Not tested

ADD COMMENTlink modified 19 months ago • written 19 months ago by Joe18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1707 users visited in the last hour