Should I remove GAPs from alignment before making phylogenetic tree?
0
0
Entering edit mode
5.2 years ago
Seq225 ▴ 110

I am working on horizontal gene transfer (HGT), and as part of my pipeline, making phylogenetic tree. I am using MAFFT for creating the alignment (MSA) and RAxML for tree building. As the analysis is to see if there is transfer of a gene from a distant clade, there are lot of ambiguities in the actual alignment, such as lot's of GAPs. One common approach is to delete/remove the GAPs (to reduce noise) using automated tools like Gblocks, trimAL.

However, I came across this paper today: Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference https://www.ncbi.nlm.nih.gov/pubmed/26031838

So, what should I do now? This paper is well debated, spent more than four years under peer-review process. Interested people can read this blog: http://lab.dessimoz.org/blog/2015/08/27/filtering-alignments

Thanks.

alignment genome gene next-gen RNA-Seq • 2.7k views
ADD COMMENT
0
Entering edit mode

Gaps are as much a part of the alignment as the real characters - RAxML (I believe) is one of the few tools which actually incorporates the gap sites in to its tree reconstruction process. It will likely reduce your bootstrap confidence, and may lead to a harder-to-interpret tree, but that's pretty much to be expected from HGT.

How many sequences are you working with, and are you able to include more sequences in your dataset? That might help to improve the signal:noise in your alignment.

ADD REPLY

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6