Question: OrthoFinder running time
0
gravatar for pnatsidis
10 days ago by
pnatsidis0
pnatsidis0 wrote:

Hello,

I am running the OrthoFinder software for 34 species, average 20-30,000 proteins, except 4 of them which have ~60,000 genes. The BLAST all-v-all took 11 days to finish, and now it's on the Running OrthoFinder algorithm step. However I have no clue how much it will last. Anyone that has run OrthoFinder with such big data so that I can have an estimate? I run it on 7 nodes, each node has 128gb RAM and 20 cores.

ADD COMMENTlink modified 9 days ago by david_emms30 • written 10 days ago by pnatsidis0
0
gravatar for david_emms
9 days ago by
david_emms30
david_emms30 wrote:

Hi

OrthoFinder developer here!

The very first thing to say is that I've recently added the option to use DIAMOND instead of BLAST and I'm really impressed with it. From now on I would recommend virtually always using DIAMOND with OrthoFinder as it's about 60x faster in the tests I've done and the resulting OrthoFinder accuracy is virtually identical, you can see public benchmark results here: http://orthology.benchmarkservice.org/cgi-bin/gateway.pl

Back to your question, as a very rough guess I'd say less than 5 days to finish but probably quicker.

To add a bit more detail, I am a little surprised the BLAST calculations took that long though with the kind of computing power you've got there. For example, I've just run an analysis on 128 fungal genomes earlier this week, which I think should be pretty similar to your analysis as the total number of sequences is approximately the same (~990,000 versus ~1,280,000). I used only 1 node with 16 cores but used DIAMOND instead of BLAST. The total run time was under 19 hours to get all the orthogroups, gene trees and orthologues etc!

I am working on the performance as we speak though so there will be improvements over the previous versions at the moment. For example, the latest version uses a new method for getting orthologues from the gene trees instead of dlcpar, which has improved the accuracy and speed of this step significantly so is definitely worth considering if you're not using it already. They'll be a new paper coming out very soon which will provide details on these methods but feel free to email me or message here if I can help with any specific problems with the analysis you're currently running.

All the best

David

ADD COMMENTlink written 9 days ago by david_emms30

Thanks a lot for your answer!! How can I run OrthoFinder with the option of DIAMOND instead of BLAST?

ADD REPLYlink written 6 days ago by pnatsidis0

Install DIAMOND on your machine and have it in the system path so that you can call it using "diamond". Then when calling orthofinder you just need to add the option "-S diamond_more_sensitive" for version 2.0.0 or earlier of just "-S diamond" in future versions.

All the best David

ADD REPLYlink written 5 days ago by david_emms30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 603 users visited in the last hour