Question: Parsing Protein Trees to determine orthologs and paralogs
gravatar for lchau91
6.1 years ago by
United States
lchau9120 wrote:

Hi Everyone,

I'm trying to find orthologs and lineage specific paralogs between two species. I tried using the ensembl homology pipeline but both my species are not on the database. Therefore, I tried to write my own similar pipeline. So far I've accomplished the following:

1. Blast all for every gene in both genomes

2. Filtering of blast results based of evalue and alignment length

3. Single Linkage clustering with MCL to form gene families

4. For each gene family, I did a protein alignment with PRANK and built a tree with Treebest, which also takes in the species trees and tries to build a gene tree accordingly. 

My question deals with parsing these gene trees. I want to use these gene trees to find paralogs and orthologs between my two species but I'm not sure how to parse all of the topologies of these trees and how to determine paralogous or orthologous relationships.

Are there any programs that can take in gene trees and output a list of paralogs and orthologs?



paralogs phylogeny orthologs • 2.8k views
ADD COMMENTlink modified 6.1 years ago by jhc2.8k • written 6.1 years ago by lchau9120

Not to my knowledge. I would probably use the ETE library of python to write a parser to do the job.

ADD REPLYlink written 6.1 years ago by Joseph Hughes2.8k

I think it still could be worthwhile to contact the Ensembl helpdesk ( or the person within their Compara team who deals with the gene trees (Matthieu Muffato,, as they would happily give you their software (for either tree building and/or homology / paralogy inference) as well as any advice.

ADD REPLYlink written 6.1 years ago by Bert Overduin3.7k

I ended up using ETE library to write a parser for my gene trees but I've also contacted the Ensembl helpdesk. I'm still a novice at writing my own scripts so I'll do a comparison to see how my skills match up!

Thank you everyone!

ADD REPLYlink written 6.1 years ago by lchau9120
gravatar for jhc
6.1 years ago by
jhc2.8k wrote:

The ETE toolkit is indeed capable of doing that. You can use a species overlap algorithm to detect duplication and speciation events, or reconcile your gene tree with the expected species trees (this paper includes a comparison of both methods).

Briefly, you will need to load your gene tree as a PhyloTree object, and then call any of the tree.get_descendant_evol_events or tree.reconcile methods. Both methods will process your tree, label the nodes as speciation or duplication and return a list of speciation and duplication events. Then you can visualize your tree with to see the predictions or process the list of events. If you used tree.reconcile, a reconciled-tree PhyloTree object will also be returned, including the inferred lost branches (also visible con There are some examples in the ETE tutorial showing how to get orthology/paralogy prediction based on gene trees:

ADD COMMENTlink modified 5 months ago by RamRS27k • written 6.1 years ago by jhc2.8k
gravatar for d3p
6.1 years ago by
United States
d3p10 wrote:

maybe ?

ADD COMMENTlink modified 5 months ago by RamRS27k • written 6.1 years ago by d3p10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 651 users visited in the last hour