Question: Reducing Number Of Sequences For Phylogentic Tree Construction
gravatar for Pappu
6.0 years ago by
Pappu1.9k wrote:

I got several thousand sequences from blastp search. So I removed the sequences with >90% identity by cd-hit before MSA and also did the same after MSA construction. The assumption was that the sequences with >90% identity will end up in closly related branches. I am wondering if this cutoff makes sense.

ADD COMMENTlink modified 6.0 years ago by DG7.1k • written 6.0 years ago by Pappu1.9k

Probably more justified way of reducing the number of sequences would be to build a distanced-based tree (NJ, UPGMA) first for the whole set of sequences. And then you could use Dendroscope3 or iTol programs to auto collapse clades containing very closely-related sequences. During this auto-collapsing, the average branch length to all leaves is calculated for all internal nodes, and those clades where this value is below your threshold are collapsed. You can also specify your own support value or a certain node length.

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by a.zielezinski9.0k
gravatar for DG
6.0 years ago by
DG7.1k wrote:

I'll preface my answer with "it depends." If you were looking at strains of bacteria for instance the 90% cut-off might be too low for the question you are trying to answer. But, for most applications of phylogenetics collapsing at 90% sequence identity is generally considered fairly routine. If you need to prune down your number of taxa further the suggestion by @a.zielezinski is worth looking in to. Generally what you want to do is prune taxa when you need to make the dataset more manageable in terms of size for alignment and estimating the phylogeny while retaining as much real sequence diversity as possible.

ADD COMMENTlink written 6.0 years ago by DG7.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1338 users visited in the last hour