Question: Blast locally with multiple files in a directory as queries
0
gravatar for fec2
26 days ago by
fec220
fec220 wrote:

Hi all,

I need to run blast locally on multiple fasta files contain in a directory. So, refer to Script to run blast locally with multiple files in a directory as queries ,

I have tried:

for i in *.fasta; do ls *.fasta | parallel -a - blastp -query {} -db mydatabase -evalue 0.00001 -qcov_hsp_perc 50 -outfmt 6 -max_target_seqs 1 -out {.}.xls ; done

It is working on my Mac, however, take 1 whole day to finish a run. I have 44 fasta files in the directory, and I noticed that the blast was actually repeated many times before it stop. May I know are there any alternative for me?

Thank you.

genome • 93 views
ADD COMMENTlink modified 26 days ago by jrj.healey12k • written 26 days ago by fec220

do us a favour and don't call your output files .xls ;-)

how big are the fasta files (size wise, or # entries in it)

ADD REPLYlink written 26 days ago by lieven.sterck5.2k

The size is from 1-1.5 MB.

ADD REPLYlink written 26 days ago by fec220

I have 44 fasta files in the directory, and I noticed that the blast was actually repeated many times before it stop.

It is possible that you are exhausting a hardware resource on your Mac (most likely RAM). Have you made sure that you are able to complete one of these jobs with the database you are using before trying to start many in parallel?

ADD REPLYlink written 26 days ago by genomax68k

Thanks for your comment. Actually as mentioned by jrj.healey, removed the loop and it is working well now.

ADD REPLYlink written 26 days ago by fec220
4
gravatar for jrj.healey
26 days ago by
jrj.healey12k
United Kingdom
jrj.healey12k wrote:

You are listing your files multiple times, then looping unecessarily before trying to parallel-ly run the command. You're at least duplicating the amount of work needed, and at a glance it looks like it may be even worse than that.

It will be sufficient to do:

ls *.fasta | parallel -a - blastp -query {} -db mydatabase -evalue 0.00001 -qcov_hsp_perc 50 -outfmt 6 -max_target_seqs 1 -out {.}.tsv

Exactly how long it will take under ideal circumstances is not easy to say ahead of time. The process will run faster with fewer, shorter sequences, but it also depends how quickly a good match can be found (better matches can be returned faster).

ADD COMMENTlink modified 26 days ago • written 26 days ago by jrj.healey12k

Oh I see, thank you very much!

ADD REPLYlink written 26 days ago by fec220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 968 users visited in the last hour