Question: Parsing Protein Trees to determine orthologs and paralogs
6.1 years ago by
United States
Hi Everyone,

I'm trying to find orthologs and lineage specific paralogs between two species. I tried using the ensembl homology pipeline but both my species are not on the database. Therefore, I tried to write my own similar pipeline. So far I've accomplished the following:

1. Blast all for every gene in both genomes

2. Filtering of blast results based of evalue and alignment length

3. Single Linkage clustering with MCL to form gene families

4. For each gene family, I did a protein alignment with PRANK and built a tree with Treebest, which also takes in the species trees and tries to build a gene tree accordingly. 

My question deals with parsing these gene trees. I want to use these gene trees to find paralogs and orthologs between my two species but I'm not sure how to parse all of the topologies of these trees and how to determine paralogous or orthologous relationships.

Are there any programs that can take in gene trees and output a list of paralogs and orthologs?



paralogs phylogeny orthologs • 2.8k views
6.1 years ago by lchau9120

Not to my knowledge. I would probably use the ETE library of python to write a parser to do the job.

written 6.1 years ago by Joseph Hughes

I think it still could be worthwhile to contact the Ensembl helpdesk ( or the person within their Compara team who deals with the gene trees (Matthieu Muffato,, as they would happily give you their software (for either tree building and/or homology / paralogy inference) as well as any advice.

written 6.1 years ago by Bert Overduin

I ended up using ETE library to write a parser for my gene trees but I've also contacted the Ensembl helpdesk. I'm still a novice at writing my own scripts so I'll do a comparison to see how my skills match up!

Thank you everyone!

written 6.1 years ago by lchau9120
6.1 years ago by
jhc wrote:

The ETE toolkit is indeed capable of doing that. You can use a species overlap algorithm to detect duplication and speciation events, or reconcile your gene tree with the expected species trees (this paper includes a comparison of both methods).

Briefly, you will need to load your gene tree as a PhyloTree object, and then call any of the tree.get_descendant_evol_events or tree.reconcile methods. Both methods will process your tree, label the nodes as speciation or duplication and return a list of speciation and duplication events. Then you can visualize your tree with to see the predictions or process the list of events. If you used tree.reconcile, a reconciled-tree PhyloTree object will also be returned, including the inferred lost branches (also visible con There are some examples in the ETE tutorial showing how to get orthology/paralogy prediction based on gene trees:

written 6.1 years ago by jhc
6.1 years ago by
United States
maybe ?

written 6.1 years ago by d3p10
