Question

Legacy BLAST and BLAST+ equivalence

0

Entering edit mode

4.6 years ago

wjn0 • 0

Hi,

I'm interested whether or not users of legacy_blast.pl have noticed that it doesn't produce equivalent output. In particular, for the sequence TVVDPVSFMFWKDEAF, the following legacy command produces the expected pssm files:

blastpgp -d /root/data/uniref90 -i thread_0.p0.fa -o thread_0.p0.aln -C thread_0.p0.chk -Q thread_0.p0.mat -j 3 -a 1 -e 0.001 -h 1e-10

while the equivalent psiblast command apparently does not:

psiblast -db /root/data/uniref90 -query thread_0.p0.fa -num_iterations 3 -evalue 0.001 -num_threads 1 -inclusion_ethresh 1e-10 -out thread_0.p0.aln -out_ascii_pssm thread_0.p0.mat -out_pssm thread_0.p0.chk

Is this the expected behaviour? Also, what is the difference between the -evalue and -inclusion_ethresh flags to psiblast? The documentation is not clear on this.

Thanks!

blast psiblast • 1.6k views

ADD COMMENT • link 4.6 years ago by wjn0 • 0

1

Entering edit mode

Is this the expected behaviour?

I don't know if the behavior is expected, but it doesn't surprise me that the output is different. With blastpgp, checkpoints are created only if the search goes beyond 1 iteration. Not sure if that's the case with psiblast as well, so I would make sure that it is going beyond 1 iteration.

Also, what is the difference between the-evalue and -inclusion_ethresh flags to psiblast?

These two flags are cutoffs for display and profile building purposes, respectively. For example, you can choose to display hits with e-values up to 10.0 while using only hits < 0.001 for multiple alignments and iterative searches.

Curious why your data directory is in /root as most people would not recommend putting anything in /root other than actual administrator files.

ADD REPLY • link 4.6 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

I don't know if the behavior is expected, but it doesn't surprise me that the output is different. With blastpgp, checkpoints are created only if the search goes beyond 1 iteration. Not sure if that's the case with psiblast as well, so I would make sure that it is going beyond 1 iteration.

This is a great point, I will check into this. I think I ran into this issue previously.

These two flags are cutoffs for display and profile building purposes, respectively. For example, you can choose to display hits with e-values up to 10.0 while using only hits < 0.001 for multiple alignments and iterative searches.

So when you say 'display', do you mean hat actually appears in output? And when you say profile building, you mean included in the pssm at each iteration, right?

Curious why your data directory is in /root as most people would not recommend putting anything in /root other than actual administrator files.

In the interest of reproducibility, I'm doing this project entirely in a docker container. Normally I'd agree about it not being good practice, but for dev in a docker container I have not run into any problems. And if I ever do, I'll just restart the container :-)

Thanks a lot for your help!

ADD REPLY • link 4.6 years ago by wjn0 • 0

0

Entering edit mode

So when you say 'display', do you mean hat actually appears in output? And when you say profile building, you mean included in the pssm at each iteration, right?

Affirmative.

ADD REPLY • link 4.6 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Not answering your question but am curious as to why you are still using legacy BLAST that was deprecated a while back. You may need to email blast help desk to get an authoritative answer for this one.

ADD REPLY • link 4.6 years ago by GenoMax 141k

0

Entering edit mode

I'd prefer not to use legacy blast at all, but it's a dependency of some software I'm working with (ACCpro). So I'm trying to translate away from the legacy blast calls.

ADD REPLY • link 4.6 years ago by wjn0 • 0

0

Entering edit mode

Lots of people are using legacy BLAST, as evidenced by the fact that it is still maintained. As wjn0 explained, lots of classifiers - including mine - still depend on it. I haven't used "modern" BLAST enough to speak intelligently about its potential advantages in plain sequence searches, but legacy BLAST works better for me when it comes to building checkpoints and creating features for machine learning. I suspect others have observed the same pattern judging by continuous use of legacy BLAST.

ADD REPLY • link 4.6 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

I wouldn't even bother translating the legacy blast calls, except I want to batch them with other BLAST+ calls to avoid duplication and improve the speed of the pipeline. Legacy blast has never failed me.

ADD REPLY • link 4.6 years ago by wjn0 • 0