Question: How to do multiple blast for multifasta file/s with command line blast
0
gravatar for tcf.hcdg
2.8 years ago by
tcf.hcdg60
European Union
tcf.hcdg60 wrote:

Hello I have a dataset of around 30,000 query sequences and would like to do blast search for this dataset against 4 different databases.

I know to do blast with command line option and it is working fine for individual case.

I wonder If there any way to do multiple blast automatically with command line option where databases and blast parameters are different for each case.

blast • 2.3k views
ADD COMMENTlink modified 2.8 years ago by Pierre Lindenbaum116k • written 2.8 years ago by tcf.hcdg60
1

While setting these up would be relatively straightforward with a for loop, the question is are you planning to submit the jobs via a scheduler to a cluster or run them locally in serial fashion. That is a big input dataset. Do you need to submit 30,000 separate queries or are you planning to chunk them as sets of multi-fasta files?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax62k

I would like to run them locally on my computer. Sequences are in multifasta file and I need to do blast for each of the file separately with different parameters and different database

ADD REPLYlink written 2.8 years ago by tcf.hcdg60

where databases and blast parameters are different for each case.

what is your "model" ? do you have a file with the fasta file(s) and the conditions ? a tsv file ? a xml file ?

ADD REPLYlink written 2.8 years ago by Pierre Lindenbaum116k

Actually I have 4 different data files that contains (10k,20k,30k,40k) sequences respectively. What I would like to search each of the files in four different database (reference1, reference2,reference3,reference4).
the parameter what I need to search are 2 i-e % identity 70% and 90%. In total I would like to do (4x4 = 16x2= 32 )blast search.

I would like to do it locally on my computer with 2.2.31+... Is there any way to do it automatically means I write the paramaters and database for each blast in a script and got the results in 32 files one for each..

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by tcf.hcdg60

I made four database locally on my computer with makeblastdb options

ADD REPLYlink written 2.8 years ago by tcf.hcdg60
1
gravatar for Pierre Lindenbaum
2.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

you could define your model into a json/javascript . Below is a program for jrunscript https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jrunscript.html


var out=java.lang.System.out;
var model = {
"fasta":["f1.fa","f2.fa","f3.fa"],
"database": ["db1.fa", "db2.fa","db6.fa","db7.fa"],
"evalue":[1.0,0.7],
"identity":[70,80]
};

var i=0,j,k,m,targetid=0;
out.println("T=");
out.println("all: all_targets");

for(i=0;i< model.database.length;++i)
{
var name=  model.database[i];
out.println("$(addsuffix .nin,"+name+"): "+name);
out.println("\tmakeblastdb -dbttype nucl -in $<");
}



for(i in model.fasta)
    for(j in model.database)
        for(k in model.evalue)
            for(m in model.identity) {
                targetid++;
                out.println("T+= t"+targetid+".blast");
                out.println("t"+targetid+".blast : $(addsuffix .nin,"+model.database[j]+") "+model.fasta[i]);
                out.println("\tblastn -db "+model.database[j]+" -out $@ -evalue "+ model.evalue[k]+" -query "+model.fasta[i]+" -perc_identity "+model.identity[m]);

                }


out.println("all_targets: ${T}");

The program loop over the parameters and generate a Makefile. Here is the output of jrunscript input.js

T=
all: all_targets
$(addsuffix .nin,db1.fa): db1.fa
    makeblastdb -dbttype nucl -in $<
$(addsuffix .nin,db2.fa): db2.fa
    makeblastdb -dbttype nucl -in $<
$(addsuffix .nin,db6.fa): db6.fa
    makeblastdb -dbttype nucl -in $<
$(addsuffix .nin,db7.fa): db7.fa
    makeblastdb -dbttype nucl -in $<
(...)
T+= t46.blast
t46.blast : $(addsuffix .nin,db7.fa) f3.fa
    blastn -db db7.fa -out $@ -evalue 1 -query f3.fa -perc_identity 80
T+= t47.blast
t47.blast : $(addsuffix .nin,db7.fa) f3.fa
    blastn -db db7.fa -out $@ -evalue 0.7 -query f3.fa -perc_identity 70
T+= t48.blast
t48.blast : $(addsuffix .nin,db7.fa) f3.fa
    blastn -db db7.fa -out $@ -evalue 0.7 -query f3.fa -perc_identity 80
all_targets: ${T}

You can pipe the output into GNU make and run in a multi threaded environment (option -j of make):

(not tested)

$ jrunscript input.js | make -f - -j 10

makeblastdb -dbttype nucl -in db1.fa
blastn -db db1.fa -out t1.blast -evalue 1 -query f1.fa -perc_identity 70
blastn -db db1.fa -out t2.blast -evalue 1 -query f1.fa -perc_identity 80
blastn -db db1.fa -out t3.blast -evalue 0.7 -query f1.fa -perc_identity 70
blastn -db db1.fa -out t4.blast -evalue 0.7 -query f1.fa -perc_identity 80
makeblastdb -dbttype nucl -in db2.fa
blastn -db db2.fa -out t5.blast -evalue 1 -query f1.fa -perc_identity 70
blastn -db db2.fa -out t6.blast -evalue 1 -query f1.fa -perc_identity 80
blastn -db db2.fa -out t7.blast -evalue 0.7 -query f1.fa -perc_identity 70
blastn -db db2.fa -out t8.blast -evalue 0.7 -query f1.fa -perc_identity 80
makeblastdb -dbttype nucl -in db6.fa
blastn -db db6.fa -out t9.blast -evalue 1 -query f1.fa -perc_identity 70
blastn -db db6.fa -out t10.blast -evalue 1 -query f1.fa -perc_identity 80
blastn -db db6.fa -out t11.blast -evalue 0.7 -query f1.fa -perc_identity 70
blastn -db db6.fa -out t12.blast -evalue 0.7 -query f1.fa -perc_identity 80
makeblastdb -dbttype nucl -in db7.fa
blastn -db db7.fa -out t13.blast -evalue 1 -query f1.fa -perc_identity 70
blastn -db db7.fa -out t14.blast -evalue 1 -query f1.fa -perc_identity 80
blastn -db db7.fa -out t15.blast -evalue 0.7 -query f1.fa -perc_identity 70
blastn -db db7.fa -out t16.blast -evalue 0.7 -query f1.fa -perc_identity 80
blastn -db db1.fa -out t17.blast -evalue 1 -query f2.fa -perc_identity 70
blastn -db db1.fa -out t18.blast -evalue 1 -query f2.fa -perc_identity 80
blastn -db db1.fa -out t19.blast -evalue 0.7 -query f2.fa -perc_identity 70
blastn -db db1.fa -out t20.blast -evalue 0.7 -query f2.fa -perc_identity 80
blastn -db db2.fa -out t21.blast -evalue 1 -query f2.fa -perc_identity 70
blastn -db db2.fa -out t22.blast -evalue 1 -query f2.fa -perc_identity 80
blastn -db db2.fa -out t23.blast -evalue 0.7 -query f2.fa -perc_identity 70
blastn -db db2.fa -out t24.blast -evalue 0.7 -query f2.fa -perc_identity 80
blastn -db db6.fa -out t25.blast -evalue 1 -query f2.fa -perc_identity 70
blastn -db db6.fa -out t26.blast -evalue 1 -query f2.fa -perc_identity 80
blastn -db db6.fa -out t27.blast -evalue 0.7 -query f2.fa -perc_identity 70
blastn -db db6.fa -out t28.blast -evalue 0.7 -query f2.fa -perc_identity 80
(...)
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Pierre Lindenbaum116k

@Pierre: Problem is are the jobs submitted this way going to play nice (i.e. wait until the first one completes). @tcf.hcdg wants to do this on a standalone computer.

ADD REPLYlink written 2.8 years ago by genomax62k

on a standalone cumputer: don't use the option '-j'

ADD REPLYlink written 2.8 years ago by Pierre Lindenbaum116k

I tried to run "js" file with the following option, but its giving the following error.. Actually I haven't work on java before that therefore haven't even basic information (aploygy if I made basic mistake in running command)

C:\myprog\blast-2.2.31+>jrunscript -e -l js -f input.js
'jrunscript' is not recognized as an internal or external command,
operable program or batch file.

C:\myprog\blast-2.2.31+>js
'js' is not recognized as an internal or external command,
operable program or batch file.

I have BLAST installed locally on my "windows computer"

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by tcf.hcdg60

You will have to install Java Development kit (JDK) for Windows to use jrunscript.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax62k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour