Hi there,
I am trying to understand how to define "between_species_paralog" in Ensembl and EnsemblGenomes and found out the definitions here:
For Ensembl: (http://www.ensembl.org/info/genome/compara/homology_method.html)
Currently, we only annotate between_species_paralog when there is no better match for any of the genes, and the duplication is weakly-supported (duplication confidence score ≤ 0.25).
For EnsemblGenomes: (http://fungi.ensembl.org/info/genome/compara/homology_method.html)
When the node in the gene-tree is labelled as dubious (i.e has a duplication confidence score of 0)
When there is no better match for any of the genes (regardless of the duplication confidence score)
When at least one gene does not have a better match, and the duplication is weakly-supported (duplication confidence score ≤ 0.25)
As I understand, the higher the duplication confidence score, the more likely the node is a duplication event. For example, the duplication confidence score for Mmus1:Hsap2 (from Figure 1 on http://www.ensembl.org/info/genome/compara/homology_method.html) is 1, and as a result, their ancestral node is defined as a duplication node. But it seems to be inconsistent with their definitions?
In addition, how to comprehend the three types of duplication node as above mentioned, are there any specific examples to explain them?
Many thanks,
Best regards,
Tom
We'll look into the expanded query you sent to Ensembl helpdesk, answer in detail there and post highlights here for anyone who searches.