Hi,
I'm interested whether or not users of legacy_blast.pl
have noticed that it doesn't produce equivalent output. In particular, for the sequence TVVDPVSFMFWKDEAF
, the following legacy command produces the expected pssm files:
blastpgp -d /root/data/uniref90 -i thread_0.p0.fa -o thread_0.p0.aln -C thread_0.p0.chk -Q thread_0.p0.mat -j 3 -a 1 -e 0.001 -h 1e-10
while the equivalent psiblast
command apparently does not:
psiblast -db /root/data/uniref90 -query thread_0.p0.fa -num_iterations 3 -evalue 0.001 -num_threads 1 -inclusion_ethresh 1e-10 -out thread_0.p0.aln -out_ascii_pssm thread_0.p0.mat -out_pssm thread_0.p0.chk
Is this the expected behaviour? Also, what is the difference between the -evalue
and -inclusion_ethresh
flags to psiblast? The documentation is not clear on this.
Thanks!
I don't know if the behavior is expected, but it doesn't surprise me that the output is different. With
blastpgp
, checkpoints are created only if the search goes beyond 1 iteration. Not sure if that's the case withpsiblast
as well, so I would make sure that it is going beyond 1 iteration.These two flags are cutoffs for display and profile building purposes, respectively. For example, you can choose to display hits with e-values up to 10.0 while using only hits < 0.001 for multiple alignments and iterative searches.
Curious why your data directory is in
/root
as most people would not recommend putting anything in/root
other than actual administrator files.This is a great point, I will check into this. I think I ran into this issue previously.
So when you say 'display', do you mean hat actually appears in output? And when you say profile building, you mean included in the pssm at each iteration, right?
In the interest of reproducibility, I'm doing this project entirely in a docker container. Normally I'd agree about it not being good practice, but for dev in a docker container I have not run into any problems. And if I ever do, I'll just restart the container :-)
Thanks a lot for your help!
Affirmative.
Not answering your question but am curious as to why you are still using legacy BLAST that was deprecated a while back. You may need to email blast help desk to get an authoritative answer for this one.
I'd prefer not to use legacy blast at all, but it's a dependency of some software I'm working with (ACCpro). So I'm trying to translate away from the legacy blast calls.
Lots of people are using legacy BLAST, as evidenced by the fact that it is still maintained. As wjn0 explained, lots of classifiers - including mine - still depend on it. I haven't used "modern" BLAST enough to speak intelligently about its potential advantages in plain sequence searches, but legacy BLAST works better for me when it comes to building checkpoints and creating features for machine learning. I suspect others have observed the same pattern judging by continuous use of legacy BLAST.
I wouldn't even bother translating the legacy blast calls, except I want to batch them with other BLAST+ calls to avoid duplication and improve the speed of the pipeline. Legacy blast has never failed me.