Question: Identifying paralogs using phylogenetic tree
ashish200 wrote:

How do you identify paralogs from a phylogenetic tree. I was wondering if we can do it like this: 1. Perform BLAST and filter the results with less than 85% percentage identity. 2, create a phylogenetic tree with bootstrap. 3. filter out those results from blast output which do not group together in the phylogenetic tree.

but its not clear to me which ones are paralogs in phylogentic tree. See this tree (link), it a tree of a family of proteins from a single species, are the ones marked here in red paralogs? It would be really good if someone can elaborate about identifying paralogs in a phylogenetic tree. Thanks


paralogs phylogeny • 1.6k views
You might want to take a look at: They did this for fungi and were able to track down paralogs, especially after the whole genome duplication event.

Apply the definition: two genes are paralogs if their last common ancestor is a duplication event.

Leo Martins220 wrote:

I guess you are asking about a gene tree reconciliation. Basically, if you know the species tree, then you can map all nodes from your gene tree -- the one you are estimating -- as speciations, duplications or losses. Therefore you would be able to find the sequences paralogous to the others. Notice that the gene tree may have several leaves labelled by (pointing to) the same species, while the species tree is uniquely labelled.

Assuming that you don't know the species tree, you can try to infer the species tree(s) that minimizes the number of duplications and losses, e.g. through the software iGTP . You can also use more sophisticated models for finding the species tree.

(PS: This is not what you are asking, but AFAIU the most common methods for finding orthologs do not rely on the gene family tree, and use pairwise distances instead)

Thanks for telling me the exact keywords for google search. This really helps. I will Read about the things you have mentioned. I wanted to ask if, after finding orthologs using pairwise distance, we create a tree using our query sequences and the potential orthologs from blast output. Now based on the tree if we remove those potential orthologs which do not group with our query sequences, will it make the final results better or are the sequences we filtered distant homologs and hence should not be removed.

