Question: Diamond vs GhostX vs RapSearch2?
2
gravatar for Dgg32
14 months ago by
Dgg3260
Bremen
Dgg3260 wrote:

Hi, community. Does anybody have compared the three Blast alternatives, namely Diamond, GhostX and Rapsearch2? What are the pros and cons of the three and which one stands out from the pack?

I myself have only used RapSearch2 which relies on a reduced alphabet to speed things up. And I have only used Diamond in its blastX mode and can't say much about its sensitivity and other things.

I appreciate any info about them. Thank you!

alignment • 745 views
ADD COMMENTlink modified 14 months ago by h.mon27k • written 14 months ago by Dgg3260

I'm seriously interested (and have been doing some preliminary benchmarks myself) in this as well. Was actually planning on making a similar post but got scooped apparently

ADD REPLYlink modified 14 months ago • written 14 months ago by lieven.sterck6.0k

DIAMOND is much faster than RapSearch2 - see the DIAMOND paper. The paper doesn't compare against GHOSTX.

GHOSTX paper compares against RapSearch (I suppose it is RapSearch2, although the paper says throughout RapSearch), and GHOSTX is slightly faster than RapSearch, and also slightly more sensitive.

Take-away: DIAMOND is much faster than both.

You might be interested in testing PLAST, although I didn't see the same speed-up as seen at this benchmark - for me, DIAMOND is still faster.

ADD REPLYlink modified 14 months ago • written 14 months ago by h.mon27k
3
gravatar for h.mon
14 months ago by
h.mon27k
Brazil
h.mon27k wrote:

update 1: included max memory usage, database size and software versions. Also, PLAST has run ~20s slower than before, confirmed over 3 runs.

update 2: included RAPSearch2.

So I just did a very small benchmarking. The setup is simple: the query are the predicted proteins (13498) from a small genome, the database are the annotated proteins (14187) from the NCBI annotation from a close relative, e-value cutoff 1e-5, one hit per query. I've run DIAMOND, GHOSTX, PLAST and BLAST+, all with one thread. Here are the results:

           | version |  time  | #hits |  max mem  | db size
blast+     |   2.7.1 | 33m20s | 12682 |   46.4 Mb |   8.3 Mb
plast      |   2.3.2 |  5m54s | 12657 |  307.1 Mb |   8.3 Mb
ghostx     |   1.3.7 |  2m16s | 12585 | 1149.7 Mb | 303.0 Mb
diamond    |  0.9.22 |  1m35s | 12626 |   49.4 Mb |   7.6 Mb
rapsearch2 |    2.24 |  3m13s | 12578 |  973.4 Mb |  63.0 Mb

Conclusions: in this setting, GHOSTX and RAPSearch2 clearly are not the best, as DIAMOND is both faster and more sensitive. BLAST+ is the most sensitive of them all, as expected, but a lot slower, with PLAST in between BLAST+ and DIAMOND both in terms of speed and sensitivity.

Caveats: take the sensitivity of this test with a grain of salt, as query and subject are closely related, and the cutoff is more or less rigorous - sensitivity could change a lot for a more realistic setting of searching against NR, for example. Of course the same could be said for time and memory usage. I hope to update this post in case I need to run some different data sets.

ADD COMMENTlink modified 14 months ago • written 14 months ago by h.mon27k
1

Did you keep track of how much memory was required by each program? What are the sizes of the database files/indexes?

ADD REPLYlink written 14 months ago by genomax73k

what a great tiny test, I'd valued this 5 thumbs if I could

ADD REPLYlink modified 14 months ago • written 14 months ago by Carambakaracho1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1113 users visited in the last hour