Why it is important to remove duplicate sequences before a ML tree construction?
0
1
Entering edit mode
7.6 years ago

Hi everyone!

I am performing a spatio-temporal analysis of 652 viral samples but I have found information that as part of the process I should remove duplicate sequences prior an ML tree construction. However, I have not found information that support this procedure. Moreover, this will be a problem if we consider that we could remove identical samples but from different years and/or locations. As far as I understand, the main issue is associated with the computational cost while other reasons are related to technical problems of some programs to deal with duplicate sequences. Please, it would be really great to understand if this step is really necessary. Thanks!

viral evolution analysis Tree construction gene • 3.0k views
ADD COMMENT
0
Entering edit mode

To make a tree you will need an alignment.

but many or all alinment programs will not work with sequences containing some duplicates

(sequences which do not differ from each other) especially if they also have the same headers.

The corresponding branches of the tree will have to be at the same place simultaneously,

tree-building programs don't like it.

ADD REPLY
0
Entering edit mode

Am I wrong in thinking that identical (duplicate) sequences should in fact be very easy to align? Multiple sequence aligners will have not problem, as far as I am aware. In that case the problem is entirely with the phylogenetic inference software - but I have been unable to find any discussion of why this is the case. Why do tree-building programs not like identical sequences?

The following question from the FAQ of IQ-TREE suggests that the answer has something to do with the ability to calculate bootstrap support:

How does IQ-TREE treat identical sequences?

But I feel that this is still not a proper explanation.

ADD REPLY

Login before adding your answer.

Traffic: 2687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6