Question: Should I remove GAPs from alignment before making phylogenetic tree?
17 months ago by
Seq22590 wrote:

I am working on horizontal gene transfer (HGT), and as part of my pipeline, making phylogenetic tree. I am using MAFFT for creating the alignment (MSA) and RAxML for tree building. As the analysis is to see if there is transfer of a gene from a distant clade, there are lot of ambiguities in the actual alignment, such as lot's of GAPs. One common approach is to delete/remove the GAPs (to reduce noise) using automated tools like Gblocks, trimAL.

However, I came across this paper today: Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference

So, what should I do now? This paper is well debated, spent more than four years under peer-review process. Interested people can read this blog:


ADD COMMENTlink written 17 months ago by Seq22590

Gaps are as much a part of the alignment as the real characters - RAxML (I believe) is one of the few tools which actually incorporates the gap sites in to its tree reconstruction process. It will likely reduce your bootstrap confidence, and may lead to a harder-to-interpret tree, but that's pretty much to be expected from HGT.

How many sequences are you working with, and are you able to include more sequences in your dataset? That might help to improve the signal:noise in your alignment.

ADD REPLYlink written 17 months ago by Joe17k
