pairwise2 extended_protein alphabet
1
0
Entering edit mode
6.1 years ago
aocken • 0

I am getting a Key Error when I try to use pairwise2 alignment. I have a translated generic_protein sequence and an extended_protein sequence from an indexed fasta file.

File "/biopython-1.70/Bio/pairwise2.py", line 407, in __call__
    return _align(**keywds)
  File "/biopython-1.70/Bio/pairwise2.py", line 457, in _align
    align_globally, score_only)
  File "/biopython-1.70/Bio/pairwise2.py", line 649, in _make_score_matrix_fast
    match_fn(sequenceA[row - 1], sequenceB[col - 1])
  File "/biopython-1.70/Bio/pairwise2.py", line 982, in __call__
    return self.score_dict[(charA, charB)]
KeyError: ('U', 'M')

Can pairwise2 do an alignment with the IUPAC Extended Protein Alphabet?

Example sequences that return the error above:

Translated generic_protein: 
MYISPEEFKPIAEKLTGSVPVANYEEEELSHDPSEETVTIESRFQPLLTETMTKSKDGFLGVSRLALSGLRNWTTAASPSAVFAARHFRPFLPPPGQELGQPWWIIPGELSVFTGYLSNNRFYPPPPKGKEVRTALGDRGCTWPQLESGPSCVNLDPST

Indexed extended_protein sequence:
MPSVFQKNIISRVSQIQLVPTGTTPPAKMPEDTAENWPGLVIMATTEHKPRFNYGGGGGGGGGTPLLLSGSSCPLVRTFGTGIGRLGPDSCLLAQPLPPKQESALKVLGTDGLFLFSSLDTDQDMYISPEEFKPIAEKLTGSVPEASYEEEELHHDPSEETLTIEARFQPLLTETMTKSKDGFLGVSRLALSGLRNWTTAASPTAEFAARHFRAFLPPPGQELGQPWWIIPGELSVFTGYLSNNRFYPPPPKDKEVIIHRLLSMFHPRPFVKTRFAPQGTVACLTAISDSYYTVMFRIHAEFQLSEPPDFPFWFSPGQFTGHIILSKDATHIRDFRLFVPNHRSLNVDMEWLYGASETSNMEVDIGYIPQMELEAMGPSVPSVILDEDGNMIDSRLPSGEPLQFVFEEIKWHQELSWEEAARRLEVAMYPFKKVSYLPFTEAFDRAKAENKLVHSILLWGALDDQSCUGSGRTLRETVLESPPILTLLNESFISTWSLVKELEDLQSQQENALHRQLAGLHLEKYSFPVEMMVCLPNGTVVHHINANYFLDITSMKPEDLDNKVFSFSSTFEDPSTATYMQFLREGLRRGLPLLEP
alignment sequence biopython python pairwise2 • 2.0k views
ADD COMMENT
3
Entering edit mode
6.1 years ago

The problem is that matrix used for amino acid substitutions does not include extra amino acids (e.g. U). And this causes further errors. However, you can easily overcome this issue by creating a callback function that will handle such cases.

Python code:

from Bio import pairwise2
from Bio.pairwise2 import format_alignment
from Bio.SubsMat import MatrixInfo as matlist

seq1 = 'KAENKLVHSIILWSALDQSCUGSGRTLRETVLESP'        # There is U in it.
seq2 = 'KAENKLVHSILLWGALDDQSCGSGRTLRETVLESP'

matrix = matlist.blosum62

# Callback function
def matrix_get(aa1, aa2):
    if (aa1, aa2) in matrix:
        val = matrix[(aa1, aa2)]
    elif (aa2, aa1) in matrix:
        val = matrix[(aa2, aa1)]
    else:
        val = 0
    return val

for a in pairwise2.align.globalcs(seq1, seq2, matrix_get, -10, -0.5):
    print(format_alignment(*a))

Output:

KAENKLVHSIILWSALD-QSCUGSGRTLRETVLESP
||||||||||||||||||||||||||||||||||||
KAENKLVHSILLWGALDDQSC-GSGRTLRETVLESP
  Score=157

KAENKLVHSIILWSAL-DQSCUGSGRTLRETVLESP
||||||||||||||||||||||||||||||||||||
KAENKLVHSILLWGALDDQSC-GSGRTLRETVLESP
  Score=157
ADD COMMENT

Login before adding your answer.

Traffic: 2400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6