I have a very large multiple sequence alignment (~30000 sequences) which was created using MAFFT. Since MAFFT is very fast but not the most accurate I am now trying to select a set of "representative" sequences (300 or so) from this alignment with the aim of aligning these representatives using a more accurate aligning tool, and then worry about how to align the others to those representatives.

I am assuming this sort of thing has been done before, but I am having trouble finding any literature on the subject. So I was hoping someone could point me in the right direction :)


