update 1: included max memory usage, database size and software versions. Also, PLAST has run ~20s slower than before, confirmed over 3 runs.
update 2: included RAPSearch2.
So I just did a very small benchmarking. The setup is simple: the query are the predicted proteins (13498) from a small genome, the database are the annotated proteins (14187) from the NCBI annotation from a close relative, e-value cutoff 1e-5, one hit per query. I've run DIAMOND, GHOSTX, PLAST and BLAST+, all with one thread. Here are the results:
| version | time | #hits | max mem | db size
blast+ | 2.7.1 | 33m20s | 12682 | 46.4 Mb | 8.3 Mb
plast | 2.3.2 | 5m54s | 12657 | 307.1 Mb | 8.3 Mb
ghostx | 1.3.7 | 2m16s | 12585 | 1149.7 Mb | 303.0 Mb
diamond | 0.9.22 | 1m35s | 12626 | 49.4 Mb | 7.6 Mb
rapsearch2 | 2.24 | 3m13s | 12578 | 973.4 Mb | 63.0 Mb
Conclusions: in this setting, GHOSTX and RAPSearch2 clearly are not the best, as DIAMOND is both faster and more sensitive. BLAST+ is the most sensitive of them all, as expected, but a lot slower, with PLAST in between BLAST+ and DIAMOND both in terms of speed and sensitivity.
Caveats: take the sensitivity of this test with a grain of salt, as query and subject are closely related, and the cutoff is more or less rigorous - sensitivity could change a lot for a more realistic setting of searching against NR, for example. Of course the same could be said for time and memory usage. I hope to update this post in case I need to run some different data sets.
modified 7 months ago
7 months ago by
h.mon ♦ 24k