How to align two strings with ambiguous alphabet using biopython?
0
0
Entering edit mode
4.3 years ago
ACGC ▴ 10

My aim is to have alignments like this:

 ACTGACTGRTGC
 |||||||||||| 
 ACTGACTGATGC

(R stands for A or G) but at the moment ambiguous IUPAC letters are recognized as mismatch with my code:

ACTGACTGRTGC
||||||||.|||
ACTGACTGATGC

My code:

from Bio.Seq import Seq
from Bio.Seq import IUPAC
from Bio import Align

seq1 = Seq("ACTGACTGRTGC", alphabet=IUPAC.ambiguous_dna)
seq2 = Seq("ACTGACTGATGC", alphabet=IUPAC.ambiguous_dna)

aligner = Align.PairwiseAligner()
alignments = aligner.align(seq1, seq2)
for alignment in sorted(alignments):
    print(alignment.score)
    print(alignment)

Do you have hints how ambiguous DNA letters are aligned as matches?

alignment Biopython • 1.9k views
ADD COMMENT
1
Entering edit mode

Thanks to you all! With addition of an own substitution matrix which is used by a call back function as argument in the pairwise.align-function I reached my aim :) @Joe: In this case a dot stands for a mismatch as format_alignment() in pairwise2.py shows

ADD REPLY
0
Entering edit mode

Perhaps you could specify a custom substitution matrix as a parameter to the aligner that handles IUPAC ambiguities as matches, so that an alignment of A to R, for instance, would score as well as A to A (and likewise for G).

ADD REPLY
0
Entering edit mode

Have a look here. You can specify callback functions to do just what Alex Reynolds suggested above.

ADD REPLY
0
Entering edit mode

I could be wrong, but I don't think a . is a mismatch. It normally signifies a 'similarity'. A mismatch is generally empty completely I think.

So technically it's not incorrect to say R is 'similar' to A or G.

ADD REPLY

Login before adding your answer.

Traffic: 2658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6