Entering edit mode
4.3 years ago
ACGC
▴
10
My aim is to have alignments like this:
ACTGACTGRTGC
||||||||||||
ACTGACTGATGC
(R stands for A or G) but at the moment ambiguous IUPAC letters are recognized as mismatch with my code:
ACTGACTGRTGC
||||||||.|||
ACTGACTGATGC
My code:
from Bio.Seq import Seq
from Bio.Seq import IUPAC
from Bio import Align
seq1 = Seq("ACTGACTGRTGC", alphabet=IUPAC.ambiguous_dna)
seq2 = Seq("ACTGACTGATGC", alphabet=IUPAC.ambiguous_dna)
aligner = Align.PairwiseAligner()
alignments = aligner.align(seq1, seq2)
for alignment in sorted(alignments):
print(alignment.score)
print(alignment)
Do you have hints how ambiguous DNA letters are aligned as matches?
Thanks to you all! With addition of an own substitution matrix which is used by a call back function as argument in the pairwise.align-function I reached my aim :) @Joe: In this case a dot stands for a mismatch as format_alignment() in pairwise2.py shows
Perhaps you could specify a custom substitution matrix as a parameter to the aligner that handles IUPAC ambiguities as matches, so that an alignment of A to R, for instance, would score as well as A to A (and likewise for G).
Have a look here. You can specify callback functions to do just what Alex Reynolds suggested above.
I could be wrong, but I don't think a
.
is a mismatch. It normally signifies a 'similarity'. A mismatch is generally empty completely I think.So technically it's not incorrect to say R is 'similar' to A or G.