OrthoFinder: Diamond or Blast? Which is better?
2
1
Entering edit mode
4 months ago
greyman ▴ 160

I saw some research showed that blast is 15% more sensitive than diamond, while I am interested in using blastp in OrthoFinder, I can't seemed to find the parameter of Blast being documented, there is also nothing in the log.file from Orthofinder output. How did blastp from Orthofinder select an ortholog? based on which parameter cutoff? I believe using OrthoFinder is better than bidirectional blast in finding orthologs, but I hope I can find more documentation that support this thought. The only input parameter Orthofinder accepted is -S for selecting propgram of interest( is there any other blast option ? ) . Any advice, suggestion, post that I have missed from biostar or criticism are welcome. Thank you in advance.

orthofinder blast sensitivity ortholog • 531 views
3
Entering edit mode
4 months ago
Dunois ★ 1.5k

Just to put the other answer's claim in context: the latest version of DIAMOND (v2.0.7) is as sensitive as blastp while being (at least) 80x faster. I believe OrthoFinder also supports MMseqs2, and there you'd get a 10-12x speedup at blast-like levels of sensitivity. OrthoFinder v2.5.2 is using DIAMOND v2.0.6 but that should still somewhere in this ballpark. So there's no point NOT going with DIAMOND or MMseqs2 unless you want everything to be really slow for some reason.

OrthoFinder first takes all matches with an e-value of 10^-3 (not sure if this is still the cut-off) from an all-vs-all search of the input sequence sets. Then it carries out a normalization procedure that ensures that all the best hits between all the input species have the same scores irrespective of sequence length and/or phylogenetic distance. Thereafter, for each query gene, the "worst" reciprocal best hit/RBH (or Reciprocal Best length-Normalised hit/RBNH) is identified. All matches that have a score better than this match are taken to form the orthogroup for that query, irrespective of whether they are RBNHs or not. These sets of query-matches are then clustered using the MCL algorithm to yield the orthogroups. Then gene trees are constructed and resolved using these orthogroups, leading to the identification of orthologs (and paralogs, albeit indirectly). More details can be found here, here, and here. (I have to admit that the OrthoFinder documentation was hard to follow, and I actually had to spend quite a bit of time in the GitHub issues to figure out a couple of things for myself.)

So in a sense, yes, OrthoFinder is better than just reciprocal best hits/RBHs (e.g., bi-directional BLAST) because the latter does not perform all these additional procedures. And it uses the 10^-3 e-value cut-off to start off with a broad set of matches that can then have the false positives sieved out of them in the subsequent steps.

0
Entering edit mode
4 months ago
Chen • 0

In my experience, DIAMOND is way much faster which is more important for me than other factors. I just found an new article comparing DIAMOND and blastp, which is the golden standard. They showed that with the "ultra-sensitive" mode, DIAMOND is almost as precise as blastp.

https://www.nature.com/articles/s41592-021-01101-x

If you still want to use blastp in orthofinder, I believe just add a flag "-s blast" would work, according to their Github page: https://github.com/davidemms/OrthoFinder#options-controlling-the-programs-used