5.3 years ago by
The 'quality' of an alignment is somewhat ambiguous given that an alignment is an inference of homology; we may or may not have a good idea of which sites are homologous in any given sequence set. Alignment algorithms compute alignment scores by assigning certain values to matches, mismatches, insertions/deletions, and gap extensions. These scores are then used to evaluate whether or not an alignment is better than another by simply comparing scores. However, the scoring scheme is arbitrary.
If you are concerned with the quality of your hand-curated alignment (and you may not need to be - expert 'by eye' alignments are often considered acceptable!), I would use your aligner of choice (MAFFT, perhaps?) and estimate the score of your alignment and compare it to the alignment produced or refined by the program.
One other concern: excluding misleading sites is one thing (the program Gblocks will remove regions thought to be resulting from spurious alignment), but removing non-informative sites can impact your analysis. For model-based phylogenetic approaches, invariant or slowly-evolving sites are included in the model as either a proportion of invariable sites or as part of the gamma distribution modeling among-site rate heterogeneity. If you are only selecting variable sites, it may be inappropriate to concatenate them and apply a model or a single evolutionary history. If this is the case, I would recommend SNAPP by David Bryant and others which estimates species trees from SNP data while treating individual gene trees as nuisance parameters.
Hope this helps.