Gaps/missing data treatment when making tree
Entering edit mode
9.4 years ago

I currently have a dataset where the majority of my sequences are around 550bp long, but then I have a couple of sequences that are missing about 200bp of this. The alignment has no gaps apart from this missing sequence. From what was explained to me, I thought 'Pairwise deletion' would create a tree that would not show variation between 2 sequences of different lengths if they had exactly the same sequence for the parts that did overlap/align. However when I make a NJ tree using pairwise deletion, I am seeing variation that is due to difference in length of sequences, rather than difference in aligned sequence. Other than complete deletion, is there a gaps/missing data treatment or different statistical method that will not produce variation in the tree that arises from sequences being different lengths? Thank you for your help. Apologies if this is a very basic question, I am new to phylogenetics.

phylogenetics tree • 3.6k views
Entering edit mode
9.3 years ago
kloetzl ★ 1.1k

As you use NJ for the tree, alignment-free distance estimation methods may help. Most of them only count SNPs (or estimate substitution rates) and ignore large gaps. Here are three tools, that you can try.


Login before adding your answer.

Traffic: 2143 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6