Question: Pairwise vs. Multiple Sequence Alignments: Which has better accuracy?
gravatar for weslfield
6.1 years ago by
European Union
weslfield90 wrote:

I am aligning many similar sequences from a BLAST result and looking for mutations at certain positions. My inclination is that an MSA (Clustal Omega) is the best approach but my PI is worried about misalignments and believes that Pairwise alignments against a reference sequence would be the best approach. Assuming that all the sequences to be aligned are homologs, which method would be more accurate and why? I need to convince her that I am right i.e. more information will produce better alignments. Thanks!

ADD COMMENTlink modified 4.9 years ago by Biostar ♦♦ 20 • written 6.1 years ago by weslfield90

It'd be helpful if you provided more information. Are you performing local realignment around indels in the blast results before calling variants (doing this should produce similarish results to using MSA on those regions)? How certain are you that the section of the reference that you're interested matches the sequences you're blasting? If this is data was derived from a PCR that you strongly believe is specific then MSA might work OK. If you have much in the way of off-target sequences, however, then you're going to run into problems.

ADD REPLYlink written 6.1 years ago by Devon Ryan97k

These are sequences extracted from a metagenomic sample targeting a gene of interest using BLASTp with a relatively high bit score and identity cut-off so all of the sequences to be aligned are very similar. I need to create an alignment to check for mutations based on their position in a reference sequence, so I either do a pairwise alignment of each sequence with the reference or do an MSA including the reference and then check the positions in the MSA using the reference sequence to identify the desired columns in the MSA to iterate over. This is all being done within a script because there are thousands of sequences to examine. 

ADD REPLYlink written 6.1 years ago by weslfield90
gravatar for Istvan Albert
6.1 years ago by
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

The output of the two methods are radically different,  we perform multiple sequence alignment when we are looking for conserved regions across all the sequences. MSA are not well suited characterize differences unless these also form conserved blocks.

What you most likely need are both methods. Compile differences versus a reference genome and produce MSA across all sequences.

Also note that the word homolog doesn't actually imply any threshold of similarity only a shared ancestry.


ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by Istvan Albert ♦♦ 85k
gravatar for Renesh
6.1 years ago by
United States
Renesh1.9k wrote:

To identify mutant in your sequences, the pairwise alignment with reference genome is best approach. because;

  1. The sequences that your are going to use for MSA will produce many more mismatches and can not be true mutant
  2. If you align your sequences to genome, many more sequences will align to particular position. From that aligned data, you can easily find the variants with your sequences and reference genome. In this case, you can claim true variants/mutants as you will have more number sequences (high depth).
  3. MSA is not good approach for finding the variants as it will not give good coverage for your dataset.
ADD COMMENTlink written 6.1 years ago by Renesh1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 890 users visited in the last hour