Question: pairwise2 extended_protein alphabet
0
gravatar for aocken
3 months ago by
aocken0
aocken0 wrote:

I am getting a Key Error when I try to use pairwise2 alignment. I have a translated generic_protein sequence and an extended_protein sequence from an indexed fasta file.

File "/biopython-1.70/Bio/pairwise2.py", line 407, in __call__
    return _align(**keywds)
  File "/biopython-1.70/Bio/pairwise2.py", line 457, in _align
    align_globally, score_only)
  File "/biopython-1.70/Bio/pairwise2.py", line 649, in _make_score_matrix_fast
    match_fn(sequenceA[row - 1], sequenceB[col - 1])
  File "/biopython-1.70/Bio/pairwise2.py", line 982, in __call__
    return self.score_dict[(charA, charB)]
KeyError: ('U', 'M')

Can pairwise2 do an alignment with the IUPAC Extended Protein Alphabet?

Example sequences that return the error above:

Translated generic_protein: 
MYISPEEFKPIAEKLTGSVPVANYEEEELSHDPSEETVTIESRFQPLLTETMTKSKDGFLGVSRLALSGLRNWTTAASPSAVFAARHFRPFLPPPGQELGQPWWIIPGELSVFTGYLSNNRFYPPPPKGKEVRTALGDRGCTWPQLESGPSCVNLDPST

Indexed extended_protein sequence:
MPSVFQKNIISRVSQIQLVPTGTTPPAKMPEDTAENWPGLVIMATTEHKPRFNYGGGGGGGGGTPLLLSGSSCPLVRTFGTGIGRLGPDSCLLAQPLPPKQESALKVLGTDGLFLFSSLDTDQDMYISPEEFKPIAEKLTGSVPEASYEEEELHHDPSEETLTIEARFQPLLTETMTKSKDGFLGVSRLALSGLRNWTTAASPTAEFAARHFRAFLPPPGQELGQPWWIIPGELSVFTGYLSNNRFYPPPPKDKEVIIHRLLSMFHPRPFVKTRFAPQGTVACLTAISDSYYTVMFRIHAEFQLSEPPDFPFWFSPGQFTGHIILSKDATHIRDFRLFVPNHRSLNVDMEWLYGASETSNMEVDIGYIPQMELEAMGPSVPSVILDEDGNMIDSRLPSGEPLQFVFEEIKWHQELSWEEAARRLEVAMYPFKKVSYLPFTEAFDRAKAENKLVHSILLWGALDDQSCUGSGRTLRETVLESPPILTLLNESFISTWSLVKELEDLQSQQENALHRQLAGLHLEKYSFPVEMMVCLPNGTVVHHINANYFLDITSMKPEDLDNKVFSFSSTFEDPSTATYMQFLREGLRRGLPLLEP
ADD COMMENTlink modified 3 months ago by a.zielezinski8.3k • written 3 months ago by aocken0
2
gravatar for a.zielezinski
3 months ago by
a.zielezinski8.3k
a.zielezinski8.3k wrote:

The problem is that matrix used for amino acid substitutions does not include extra amino acids (e.g. U). And this causes further errors. However, you can easily overcome this issue by creating a callback function that will handle such cases.

Python code:

from Bio import pairwise2
from Bio.pairwise2 import format_alignment
from Bio.SubsMat import MatrixInfo as matlist

seq1 = 'KAENKLVHSIILWSALDQSCUGSGRTLRETVLESP'        # There is U in it.
seq2 = 'KAENKLVHSILLWGALDDQSCGSGRTLRETVLESP'

matrix = matlist.blosum62

# Callback function
def matrix_get(aa1, aa2):
    if (aa1, aa2) in matrix:
        val = matrix[(aa1, aa2)]
    elif (aa2, aa1) in matrix:
        val = matrix[(aa2, aa1)]
    else:
        val = 0
    return val

for a in pairwise2.align.globalcs(seq1, seq2, matrix_get, -10, -0.5):
    print(format_alignment(*a))

Output:

KAENKLVHSIILWSALD-QSCUGSGRTLRETVLESP
||||||||||||||||||||||||||||||||||||
KAENKLVHSILLWGALDDQSC-GSGRTLRETVLESP
  Score=157

KAENKLVHSIILWSAL-DQSCUGSGRTLRETVLESP
||||||||||||||||||||||||||||||||||||
KAENKLVHSILLWGALDDQSC-GSGRTLRETVLESP
  Score=157
ADD COMMENTlink modified 3 months ago • written 3 months ago by a.zielezinski8.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1418 users visited in the last hour