How to align two strings with ambiguous alphabet using biopython?

0

Entering edit mode

4.3 years ago

ACGC ▴ 10

My aim is to have alignments like this:

 ACTGACTGRTGC
 |||||||||||| 
 ACTGACTGATGC

(R stands for A or G) but at the moment ambiguous IUPAC letters are recognized as mismatch with my code:

ACTGACTGRTGC
||||||||.|||
ACTGACTGATGC

My code:

from Bio.Seq import Seq
from Bio.Seq import IUPAC
from Bio import Align

seq1 = Seq("ACTGACTGRTGC", alphabet=IUPAC.ambiguous_dna)
seq2 = Seq("ACTGACTGATGC", alphabet=IUPAC.ambiguous_dna)

aligner = Align.PairwiseAligner()
alignments = aligner.align(seq1, seq2)
for alignment in sorted(alignments):
    print(alignment.score)
    print(alignment)

Do you have hints how ambiguous DNA letters are aligned as matches?

alignment Biopython • 1.9k views

ADD COMMENT • link 4.3 years ago by ACGC ▴ 10

1

Entering edit mode

Thanks to you all! With addition of an own substitution matrix which is used by a call back function as argument in the pairwise.align-function I reached my aim :) @Joe: In this case a dot stands for a mismatch as format_alignment() in pairwise2.py shows

ADD REPLY • link 4.3 years ago by ACGC ▴ 10

0

Entering edit mode

Perhaps you could specify a custom substitution matrix as a parameter to the aligner that handles IUPAC ambiguities as matches, so that an alignment of A to R, for instance, would score as well as A to A (and likewise for G).

ADD REPLY • link 4.3 years ago by Alex Reynolds 35k

0

Entering edit mode

Have a look here. You can specify callback functions to do just what Alex Reynolds suggested above.

ADD REPLY • link 4.3 years ago by cschu181 ★ 2.8k

0

Entering edit mode

I could be wrong, but I don't think a . is a mismatch. It normally signifies a 'similarity'. A mismatch is generally empty completely I think.

So technically it's not incorrect to say R is 'similar' to A or G.

ADD REPLY • link 4.3 years ago by Joe 21k

Login before adding your answer.