I am trying to construct a big phylogenetic tree where about 800 OTUs will be involved in. I guess I need to reduce the number of OTUs in the tree by choosing some representative species. However, I am not sure how to do it. Does anyone know about how representative species should be chosen? Or is there any paper about this issue? Thanks a lot!
You can easily generate phylogenies of 800 OTUs using tools such as RAxML. If you really want to reduce your set of OTUs, you will need to choose a threshold of similarity between the sequences and this becomes rather subjective. If you really want to do the latter, then useful tools are CD-HIT and its companion cdhit-cluster-concensus for creating non-redundant sets above your arbitrary threshold.