Question: Filtering Msa For Phylogenetic Tree Construction
gravatar for Pappu
6.6 years ago by
Pappu1.9k wrote:

I am trying to filter MSA for phylogenetric tree construction. So I removed columns with >33% of gap and also sequences with >33% gap. I also removed sequences with >70% sequence ideantity to reduce the number of sequences. I am wondering if it is the correct way of doing it. Thanks.

phylogenetics msa • 3.5k views
ADD COMMENTlink modified 2.5 years ago by al-ash130 • written 6.6 years ago by Pappu1.9k

It's difficult to say whether fixed parameters like that are a good idea --- it depends on how similar the sequences you are analyzing are. It is probably better to use one of the available tools that will select sites based also on e.g. biochemical similarity, or adapt "masking" parameters to the overall conservation of the alignment - trimal, guidance, bmge, gblocks, or one of several other possibilities. You could use these programs, then take a look at the alignment to get a feel for the effect they are having. Ideally, you would infer at least a couple of trees using more and less conservative filtering to see how robust your results are to site selection.

ADD REPLYlink written 6.6 years ago by Tancata200
gravatar for Aldo
6.6 years ago by
Aldo60 wrote:

You should check trimAl tool at

Here is the associated reference:

ADD COMMENTlink written 6.6 years ago by Aldo60
gravatar for Biojl
6.6 years ago by
Biojl1.7k wrote:

I agree with Aldo that trimAl is a very good tool to let's say "cut" alignments and get rid of the gappy regions, it might be OK for phylogeny but it might not be optimal for other analyses.

I don't think removing sequences with >70% similarity is a good idea, since you are losing a lot of information, although that may depend on the species you are using to build such MSA and their divergence time from each other. Specially for very close species the most informative alignments will be the most similar ones >90%.

You may be interested in selecting isoforms before the MSA and thus, obtaining better alignments that do not require that much trimming. That will not only improve your phylogenetic reconstruction but will give you better results if you do further analyses with those MSA such as positive selection or others. Reference:

ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by Biojl1.7k
gravatar for cts
6.6 years ago by
cts1.6k wrote:

It sounds good to me. These kind of filtering steps can be specific to the alignment that you're using. You might need to vary the identity cutoffs for the gaps and sequence similarity to get it right. You'll know if you're removed too many positions from the topology of the tree, removing too many variable positions will cause nodes to sit on top of each other, but based on the settings that you've used I don't think that will happen.

ADD COMMENTlink written 6.6 years ago by cts1.6k
gravatar for al-ash
2.5 years ago by
al-ash130 wrote:

I suggest to check this article on the benefits of MSA filtering for subsequent phylogeny reconstruction - you might decide not to filter at all after reading it :)

ADD COMMENTlink written 2.5 years ago by al-ash130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1922 users visited in the last hour