Question: Diamond vs GhostX vs RapSearch2?
7 months ago by
Dgg3260 wrote:

Hi, community. Does anybody have compared the three Blast alternatives, namely Diamond, GhostX and Rapsearch2? What are the pros and cons of the three and which one stands out from the pack?

I myself have only used RapSearch2 which relies on a reduced alphabet to speed things up. And I have only used Diamond in its blastX mode and can't say much about its sensitivity and other things.

I appreciate any info about them. Thank you!

alignment • 442 views
modified 7 months ago by h.mon24k • written 7 months ago by Dgg3260

I'm seriously interested (and have been doing some preliminary benchmarks myself) in this as well. Was actually planning on making a similar post but got scooped apparently

written 7 months ago by lieven.sterck4.1k

DIAMOND is much faster than RapSearch2 - see the DIAMOND paper. The paper doesn't compare against GHOSTX.

GHOSTX paper compares against RapSearch (I suppose it is RapSearch2, although the paper says throughout RapSearch), and GHOSTX is slightly faster than RapSearch, and also slightly more sensitive.

Take-away: DIAMOND is much faster than both.

You might be interested in testing PLAST, although I didn't see the same speed-up as seen at this benchmark - for me, DIAMOND is still faster.

modified 7 months ago • written 7 months ago by h.mon24k
7 months ago by
h.mon24k wrote:

update 1: included max memory usage, database size and software versions. Also, PLAST has run ~20s slower than before, confirmed over 3 runs.

update 2: included RAPSearch2.

So I just did a very small benchmarking. The setup is simple: the query are the predicted proteins (13498) from a small genome, the database are the annotated proteins (14187) from the NCBI annotation from a close relative, e-value cutoff 1e-5, one hit per query. I've run DIAMOND, GHOSTX, PLAST and BLAST+, all with one thread. Here are the results:

           | version |  time  | #hits |  max mem  | db size
blast+     |   2.7.1 | 33m20s | 12682 |   46.4 Mb |   8.3 Mb
plast      |   2.3.2 |  5m54s | 12657 |  307.1 Mb |   8.3 Mb
ghostx     |   1.3.7 |  2m16s | 12585 | 1149.7 Mb | 303.0 Mb
diamond    |  0.9.22 |  1m35s | 12626 |   49.4 Mb |   7.6 Mb
rapsearch2 |    2.24 |  3m13s | 12578 |  973.4 Mb |  63.0 Mb

Conclusions: in this setting, GHOSTX and RAPSearch2 clearly are not the best, as DIAMOND is both faster and more sensitive. BLAST+ is the most sensitive of them all, as expected, but a lot slower, with PLAST in between BLAST+ and DIAMOND both in terms of speed and sensitivity.

Caveats: take the sensitivity of this test with a grain of salt, as query and subject are closely related, and the cutoff is more or less rigorous - sensitivity could change a lot for a more realistic setting of searching against NR, for example. Of course the same could be said for time and memory usage. I hope to update this post in case I need to run some different data sets.

modified 7 months ago • written 7 months ago by h.mon24k

Did you keep track of how much memory was required by each program? What are the sizes of the database files/indexes?

written 7 months ago by genomax63k

what a great tiny test, I'd valued this 5 thumbs if I could

written 7 months ago by Carambakaracho930
