Blast locally with multiple files in a directory as queries
1
0
Entering edit mode
3.5 years ago
fec2 ▴ 40

Hi all,

I need to run blast locally on multiple fasta files contain in a directory. So, refer to Script to run blast locally with multiple files in a directory as queries ,

I have tried:

for i in *.fasta; do ls *.fasta | parallel -a - blastp -query {} -db mydatabase -evalue 0.00001 -qcov_hsp_perc 50 -outfmt 6 -max_target_seqs 1 -out {.}.xls ; done


It is working on my Mac, however, take 1 whole day to finish a run. I have 44 fasta files in the directory, and I noticed that the blast was actually repeated many times before it stop. May I know are there any alternative for me?

Thank you.

genome • 2.2k views
0
Entering edit mode

do us a favour and don't call your output files .xls ;-)

how big are the fasta files (size wise, or # entries in it)

0
Entering edit mode

The size is from 1-1.5 MB.

0
Entering edit mode

I have 44 fasta files in the directory, and I noticed that the blast was actually repeated many times before it stop.

It is possible that you are exhausting a hardware resource on your Mac (most likely RAM). Have you made sure that you are able to complete one of these jobs with the database you are using before trying to start many in parallel?

0
Entering edit mode

Thanks for your comment. Actually as mentioned by jrj.healey, removed the loop and it is working well now.

4
Entering edit mode
3.5 years ago
Joe 20k

You are listing your files multiple times, then looping unecessarily before trying to parallel-ly run the command. You're at least duplicating the amount of work needed, and at a glance it looks like it may be even worse than that.

It will be sufficient to do:

ls *.fasta | parallel -a - blastp -query {} -db mydatabase -evalue 0.00001 -qcov_hsp_perc 50 -outfmt 6 -max_target_seqs 1 -out {.}.tsv


Exactly how long it will take under ideal circumstances is not easy to say ahead of time. The process will run faster with fewer, shorter sequences, but it also depends how quickly a good match can be found (better matches can be returned faster).

0
Entering edit mode

Oh I see, thank you very much!