Error when searching for matches using the PSSM score with proteins (.motif BioPython)
4 months ago

Hi everyone! I am making this question because I have been searching for an appropriate answer for a time here and in other resources and I just did not find anything. There is a very similar question not ansewered here: pssm.search from Bio.motifs not working with amino acid sequences. So the problem is: I am trying to find a motif in a set of sequences. With this in mind, I followed the documentation provided in https://biopython.readthedocs.io/en/latest/chapter_motifs.html. But when it comes to search the motif with the PSSM that has been constructed, this error comes:

/usr/local/lib/python3.7/dist-packages/Bio/motifs/matrix.py in __init__(self, alphabet, values)
37             if self.length is None:
38                 self.length = len(values[letter])
39             elif self.length != len(values[letter]):
40                 raise Exception("data has inconsistent lengths")
41             self[letter] = list(values[letter])

KeyError: 'D'


I suspect that this has something to do with the alphabet that is being used. I am dealing with proteins (so I use amino acids code letters) but it seems that search is considering nucleotides (the 2 first rows of my matrix are amino acids A and C, and the third is D). My questions are:

• Is there any way to solve this? (I mean, this searching works with amino acids)
• Is Bio.motifs suitable for proteins? (The example in the documentation deals with nucleotides)
• If you do not know to answer the questions above, do you know if there is another way (programmatically) to find motifs in proteins since I have a dataset with motifs patterns that have been proven to being cleaved?
protein motif python biopython
4 months ago

Hi! I made this same question in Biopython's github page and the answer is: pssm.search is designed for DNA sequences only. That means the source code needs to be changed in order to deal with proteins. More informations in the link: https://github.com/biopython/biopython/issues/3636