How To Edit Phylogenetic Trees As Per Required Output And What Are The Softwares Available To Do The Same
2
0
Entering edit mode
10.6 years ago
H@rry ▴ 30

Hi,

I am working in evolutionary biology of a family of proteins in plants, Human and Yeast and likewise have few query for the experts as follows

Is good idea to modify the phylogenetic tree as per our own requirements as some members are not clustered together from the same family. If Yes, please recommend me some free software for the same and If No, then what could be the possible explanation of such study for the publication. How to explain the fact that some gene families are not close to each other or overlapped with the different gene family. I tried some software but couldn't able to obtain it as per my wish.

Any suggestion will be appreciable

Thank you in advance

• 6.5k views
ADD COMMENT
2
Entering edit mode
10.6 years ago
Josh Herr 5.8k

There are numerous way to manipulate a phylogenetic tree to your liking -- both through programs (I like FigTree, but I have a co-worker who swears by TreeView and there are many others) and easily by editing your tree file in text format.

I'm actually concerned you would want to modify your phylogenetic tree -- to me this is akin to saying "I don't like how my data looks so I will modify the figures to show what I think it should look like."

If you think something looks suspect in your phylogenetic tree, I would first go evaluate the process of constructing your phylogenetic tree -- from sequence selection to alignment (it's very important to inspect by eye!) to the analysis process -- and inspect each step of your analysis. From my experience many people do not know how to properly construct a phylogenetic tree and I see many extremely poor examples in the literature. If your tree does not look the way you would "expect" your first step should be to assess your data analysis pipeline at every step of the analysis.

If after a series of quality control steps you find the same "surprising" clustering with branches, then you should attempt to come up with a reasonable explanation of why. Since this is your study gene and you know the most about it you will be the best candidate to explain why you are seeing the clustering you find in your robust phylogenetic analysis.

ADD COMMENT
0
Entering edit mode

Thank you for the information.

I tried some options for the manipulations in Treeview but does not really worked well. I would rather use different method to construct the tree and try to see if something wonderful comes out of it according to my requirement. I agree quality controls are most important in such analysis.

ADD REPLY
2
Entering edit mode
10.6 years ago
DG 7.3k

Like Josh I would express some concern at what appears to be a desire to use a program to fix your tree. Single gene phylogenies will not always reflect the expected organismal phylogenetic relationships for any number of reasons. Some of them biological, and some due to potential methodological issues. When you start looking at gene families this is particularly true. Multiple duplications of some genes likely exist, which can confuse analysis if you are not careful with your selection of orthologs and paralogs.

Improper taxon sampling can create issues (missing or rogue taxa can both create problems, and for different reasons). Differential gene loss, gene replacement, laterla transfer... all of these can create legitimate biological confusions. Long branch attraction or rejection can cause sequences to cluster together when they shouldn't. Composition effects can cause the same thing through convergent evolution.

And, keep in mind that your underlying annotations may possibly be incorrect as well. I've seen it happen in many datasets.

My suggestions are to evaluate first your method of reconstructing the ohylogeny. For instance you should be using a full maximum-likelihood based method (or bayesian) instead of simple neighbor-joining methods. If you are using an ML or Bayesian method then evaluate the model you are using (LG versus WAG or JTT for instance with proteins). Using a program like ModelTest may take time but it will give you some insight if you are misspecifying your model.

Go through and check your ortholog/paralog selections carefully. Add taxa if you are undersampling diversity (this is usually one of my main criticisms of many papers). If some taxa look problematic do some tests to see why they are causing problems. If you do decide to remove any taxa or sequences you probably need to explain why in the publication and have a good justification for it and if you do remove sequences/taxa you need to actually redo the phylogeny as it may effect other branches and branch lengths.

ADD COMMENT
0
Entering edit mode

Thank you for the information.

I agree with Jose and you to completely and considered your point in recoonstructing the tree in different methods. I am evaluating my method of construction now, coz I used NJ and UPGMA (clustalx) for the same earlier and also checked with the bootstrap value of 1000. But unfortunately, the results were not appropriate according to me. Now, I will try to use the ML and Bayesian methods (software) for the same and try to play around more. Model testing is nice suggestions by you. I will put forth this method to filter some results.

Well, I would like to inform you that I am using alternative spliced forms in the study and like wise there are chances of duplication events which I already checked but with few genes. So, how does it make difference in such studies. Not clear to me.

ADD REPLY
1
Entering edit mode

Can you clarify how you are using alternative spliced forms in the study? Also are you doing your alignments and phylogeny based on nucleotide sequences or amino acids?

Paralog removal is primarily an issue when you only want to look at orthologs. Of course if you are studying a gene family where duplications have occured, and you are interested in the subgroups, than what you want to insure is that you have selected the right sequences such that all members within an expected subfamily are orthologs to each other. The problem being that some organisms may have had a subsequent duplication (or duplications) within a subfamily. Divergence of one or more of those more recently duplicated sequences may cause it to behave unexpectedly within your larger alignment.

ADD REPLY

Login before adding your answer.

Traffic: 2906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6