I am aligning many similar sequences from a BLAST result and looking for mutations at certain positions. My inclination is that an MSA (Clustal Omega) is the best approach but my PI is worried about misalignments and believes that Pairwise alignments against a reference sequence would be the best approach. Assuming that all the sequences to be aligned are homologs, which method would be more accurate and why? I need to convince her that I am right i.e. more information will produce better alignments. Thanks!
The output of the two methods are radically different, we perform multiple sequence alignment when we are looking for conserved regions across all the sequences. MSA are not well suited characterize differences unless these also form conserved blocks.
What you most likely need are both methods. Compile differences versus a reference genome and produce MSA across all sequences.
Also note that the word homolog doesn't actually imply any threshold of similarity only a shared ancestry.
To identify mutant in your sequences, the pairwise alignment with reference genome is best approach. because;
- The sequences that your are going to use for MSA will produce many more mismatches and can not be true mutant
- If you align your sequences to genome, many more sequences will align to particular position. From that aligned data, you can easily find the variants with your sequences and reference genome. In this case, you can claim true variants/mutants as you will have more number sequences (high depth).
- MSA is not good approach for finding the variants as it will not give good coverage for your dataset.