Diamond vs GhostX vs RapSearch2?
1
2
Entering edit mode
5.7 years ago
Dgg32 ▴ 90

Hi, community. Does anybody have compared the three Blast alternatives, namely Diamond, GhostX and Rapsearch2? What are the pros and cons of the three and which one stands out from the pack?

I myself have only used RapSearch2 which relies on a reduced alphabet to speed things up. And I have only used Diamond in its blastX mode and can't say much about its sensitivity and other things.

I appreciate any info about them. Thank you!

alignment • 2.5k views
ADD COMMENT
0
Entering edit mode

I'm seriously interested (and have been doing some preliminary benchmarks myself) in this as well. Was actually planning on making a similar post but got scooped apparently

ADD REPLY
0
Entering edit mode

DIAMOND is much faster than RapSearch2 - see the DIAMOND paper. The paper doesn't compare against GHOSTX.

GHOSTX paper compares against RapSearch (I suppose it is RapSearch2, although the paper says throughout RapSearch), and GHOSTX is slightly faster than RapSearch, and also slightly more sensitive.

Take-away: DIAMOND is much faster than both.

You might be interested in testing PLAST, although I didn't see the same speed-up as seen at this benchmark - for me, DIAMOND is still faster.

ADD REPLY
4
Entering edit mode
5.7 years ago
h.mon 35k

update 1: included max memory usage, database size and software versions. Also, PLAST has run ~20s slower than before, confirmed over 3 runs.

update 2: included RAPSearch2.

So I just did a very small benchmarking. The setup is simple: the query are the predicted proteins (13498) from a small genome, the database are the annotated proteins (14187) from the NCBI annotation from a close relative, e-value cutoff 1e-5, one hit per query. I've run DIAMOND, GHOSTX, PLAST and BLAST+, all with one thread. Here are the results:

           | version |  time  | #hits |  max mem  | db size
blast+     |   2.7.1 | 33m20s | 12682 |   46.4 Mb |   8.3 Mb
plast      |   2.3.2 |  5m54s | 12657 |  307.1 Mb |   8.3 Mb
ghostx     |   1.3.7 |  2m16s | 12585 | 1149.7 Mb | 303.0 Mb
diamond    |  0.9.22 |  1m35s | 12626 |   49.4 Mb |   7.6 Mb
rapsearch2 |    2.24 |  3m13s | 12578 |  973.4 Mb |  63.0 Mb

Conclusions: in this setting, GHOSTX and RAPSearch2 clearly are not the best, as DIAMOND is both faster and more sensitive. BLAST+ is the most sensitive of them all, as expected, but a lot slower, with PLAST in between BLAST+ and DIAMOND both in terms of speed and sensitivity.

Caveats: take the sensitivity of this test with a grain of salt, as query and subject are closely related, and the cutoff is more or less rigorous - sensitivity could change a lot for a more realistic setting of searching against NR, for example. Of course the same could be said for time and memory usage. I hope to update this post in case I need to run some different data sets.

ADD COMMENT
1
Entering edit mode

Did you keep track of how much memory was required by each program? What are the sizes of the database files/indexes?

ADD REPLY
0
Entering edit mode

what a great tiny test, I'd valued this 5 thumbs if I could

ADD REPLY

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6