Question: Script to run blast locally with multiple files in a directory as queries
0
gravatar for anicet.ebou
21 months ago by
anicet.ebou140
anicet.ebou140 wrote:

Hi everyone,

I have searched for a script allowing me to run blast locally on multiple fasta files contain in a directory. I found out this one line bash script, but it throws me an error when doing the job:

find . -type f -exec blastp -query '{}' -db swissprot -out '{}'_blastp.fas \;

Warning: [blastp] Query is Empty !

I want a solution to avoid warning when doing this stuff. I'm working on linux 16.04, running blast through terminal.

Thanks in advance.

blast • 3.2k views
ADD COMMENTlink modified 21 months ago by Petr Ponomarenko2.6k • written 21 months ago by anicet.ebou140
3
gravatar for Petr Ponomarenko
21 months ago by
United States / Los Angeles / ALAPY.com
Petr Ponomarenko2.6k wrote:

I loe GNU parallel for such things. Something like

ls *.fasta | parallel -a - blastp -query {} -db swissprot --out {.}.out

since it allows to do it in parallel for many jobs

ADD COMMENTlink modified 21 months ago • written 21 months ago by Petr Ponomarenko2.6k

i have got this output running your code parallel: invalid option -- 'a' parallel [OPTIONS] command -- arguments for each argument, run command with argument, in parallel parallel [OPTIONS] -- commands run specified commands in parallel

ADD REPLYlink written 21 months ago by anicet.ebou140

+1 thanks, this is indeed an interesting alternative. Can you please let me know how do I get it for CentOS? Is it inbuilt or shall I do a yum install

ADD REPLYlink written 21 months ago by bioExplorer3.7k

Please refer to https://www.gnu.org/software/parallel/parallel_tutorial.html

(wget -O - pi.dk/3 || curl pi.dk/3/ || \
   fetch -o - http://pi.dk/3) | bash

Usualy it is part of your dist and I have seen it on CentOS dist as well

ADD REPLYlink written 20 months ago by Petr Ponomarenko2.6k
1
gravatar for bioExplorer
21 months ago by
bioExplorer3.7k
bioExplorer3.7k wrote:

A simple for loop should be enough!

# considering your query file extension is ".fasta"

for i in *.fasta; do
name=`echo $i | awk -F "." '{print $1}'`
blastp -query $i -db swissprot -out ${name}.out
done
ADD COMMENTlink modified 21 months ago • written 21 months ago by bioExplorer3.7k

Can we use cut -f 1 -d "." instead of awk -F "." '{print $1}' ?

name=$(echo $i | cut -f 1 -d ".")
ADD REPLYlink modified 21 months ago • written 21 months ago by cpad011211k

Why not, does the same thing!

ADD REPLYlink written 21 months ago by bioExplorer3.7k

Your script seems to not work as i want. it run only one file in my directory as query and the name the output file is note formatted as needed or the output file doesn't appear at all !

I want to precise that my code works perfectly but throws warnings and my purpose i just to have a new script or find a way with my script to eliminate these warnings. Thanks @Vijay Lakhujani

ADD REPLYlink written 21 months ago by anicet.ebou140

change following line in Vijay's code: From

blastp -query $i -db swissprot -out ${name}.out

To

blastp -query $i -db swissprot -out ${name}_blastp.fas

Run the code after modification and let us know if it is precise. By the way, how many fasta files do you have in your directory (i.e files with .fasta extension)? What is the extension of fasta files in your directory (.fa or .fasta) or they zipped?

ADD REPLYlink written 21 months ago by cpad011211k

This code have the same output as Vija's code. i have 50 fasta files with .fas extension.

ADD REPLYlink written 21 months ago by anicet.ebou140

I could not understand why this should not work. This is a very basic and regular task. As mentioned by cpad0112, let me know the file extension of your fasta files.

and run ls command and share the output so that we can see the files you have in your directory. Also, share the error message if any.

Last but not the least, we assume that you have the correct paths for executing blastp and for the swissprot database.

ADD REPLYlink written 21 months ago by bioExplorer3.7k

i've got no error message but the output is not convenient.

 ediman@ediman-HP-Notebook:~/all pep$ ls

Allpep_subset_00.fas Allpep_subset_18.fas Allpep_subset_36.fas Allpep_subset_01.fas Allpep_subset_19.fas Allpep_subset_37.fas Allpep_subset_02.fas Allpep_subset_20.fas Allpep_subset_38.fas Allpep_subset_03.fas Allpep_subset_21.fas Allpep_subset_39.fas Allpep_subset_04.fas Allpep_subset_22.fas Allpep_subset_40.fas Allpep_subset_05.fas Allpep_subset_23.fas Allpep_subset_41.fas Allpep_subset_06.fas Allpep_subset_24.fas Allpep_subset_42.fas Allpep_subset_07.fas Allpep_subset_25.fas Allpep_subset_43.fas Allpep_subset_08.fas Allpep_subset_26.fas Allpep_subset_44.fas Allpep_subset_09.fas Allpep_subset_27.fas Allpep_subset_45.fas Allpep_subset_10.fas Allpep_subset_28.fas Allpep_subset_46.fas Allpep_subset_11.fas Allpep_subset_29.fas Allpep_subset_47.fas Allpep_subset_12.fas Allpep_subset_30.fas Allpep_subset_48.fas Allpep_subset_13.fas Allpep_subset_31.fas Allpep_subset_49.fas Allpep_subset_14.fas Allpep_subset_32.fas Allpep_subset_50.fas Allpep_subset_15.fas Allpep_subset_33.fas Allpep_subset_51.fas Allpep_subset_16.fas Allpep_subset_34.fas Allpep_subset_52.fas Allpep_subset_17.fas Allpep_subset_35.fas

ADD REPLYlink written 21 months ago by anicet.ebou140

In Vijay's code, change from:

for i in *.fasta; do

to

for i in *.fas; do

Run the code and let us know.

or modify Petr's code from

ls *.fasta | parallel -a - blastp -query {} -db swissprot --out {.}.out

to

ls *.fas | parallel -a - blastp -query {} -db swissprot --out {.}.out
ADD REPLYlink modified 21 months ago • written 21 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2165 users visited in the last hour