Question: Will lowering the LengthTol parameter increase false positive links in Orthology Groups?
robert.zimmermann10 wrote:

We noticed that a key gene to our study was not placed considered in any orthology group for our OMA standalone run owing to the fact that it is a transcription factor (Brachyury) and has a shorter conserved domain. By testing on a small subset it seems like lowering the LengthTol parameter to 0.4 allows it to be included.

I am concerned, though, that this will introduce dubious/spurious cluster links in the final orthology groups. Would that be the case? Is there a pruning step to prevent a low intra-group homology?


adrian.altenhoff380 wrote:

Dear Robert,

Disclaimer: I'm one of the developer of OMA standalone. Indeed, lowering the LengthTol parameter might generally link groups together that share only a short common domain. However, there are a few things that you could do:

  1. What type of groups are you looking at. OMA Groups will be generally sparse, but well suited for species tree reconstruction. If this is your goal, because in OMA Groups every protein must be a pairwise ortholog to all its other members, you could probably do lower the threshold without risking too much clustering propagation.

  2. In case you realize that OMA Groups might not be the right type of grouping for your application, check if the protein does not belong to any HOG (Hierarchical orthologous group). If it does, use this type of grouping.

  3. If your Brachyury genome has many fragments, you might also consider renaming it to Brachyury.contig.fa This way, the length criterion is skipped for the genes belonging to Brachyury while still keeping the criterion for the others. You don't need to recompute the AllAll for this.

Hope this might help you. Feel free to get back to us. Adrian

Dear Adrian,

Thanks much for your comment! That's quite helpful.

In fact, we are only interested in HOGs as we are using OMA as a quick way to determine gene gains, losses and duplications of a subset of genes of interest in all 5 taxa we are investigating.

Note that Brachyury is a gene (also known as "T"), not a genome. In this case, all examples of Brachyury that are incorrectly grouped are complete and verified. We found that lowering the LengthTol parameter to 0.4 groups the gene correctly in some taxa, but owing to a very short match between two taxa of our study, it will only be grouped together when the LengthTol is as low as 0.35. The question becomes: how crucial is LengthTol to the correct formation of HOGs? Did you ever find a lot of cases where short spurious perfect matches misplaced a gene as a in-paralog or ortholog?

Dear Robert,

sorry about the misunderstanding. Another parameter that you might consider lowering a bit is the "ReachabilityCutoff" or the "MinEdgeCompletenessFraction" parameter (depending whether you use top-down or bottom-up HOG inference). This might be especially helpful in case that the Brachyury genes do have pairwise ortholog relations with the higher LengthTol.

To answer the question about having observed false positives in HOGs when lowering the MinLength parameter, this I have certainly observed for cases with multidomain genes, that share only a partial history, e.g. through gene fission events. But otherwise, this should not be a major problem. So my advice is to go with the lower cutoff.

david_emms40 wrote:


It's worth trying OrthoFinder, it has a correction for gene length that addresses this exact problem (short genes getting excluded). If you run it using DIAMOND it's extremely fast so it'd be very easy to get results and compare them with OMA.

All the best


