Hi everyone! I am making this question because I have been searching for an appropriate answer for a time here and in other resources and I just did not find anything. There is a very similar question not ansewered here: pssm.search from Bio.motifs not working with amino acid sequences. So the problem is: I am trying to find a motif in a set of sequences. With this in mind, I followed the documentation provided in https://biopython.readthedocs.io/en/latest/chapter_motifs.html. But when it comes to search the motif with the PSSM that has been constructed, this error comes:
/usr/local/lib/python3.7/dist-packages/Bio/motifs/matrix.py in __init__(self, alphabet, values) 37 if self.length is None: 38 self.length = len(values[letter]) 39 elif self.length != len(values[letter]): 40 raise Exception("data has inconsistent lengths") 41 self[letter] = list(values[letter]) KeyError: 'D'
I suspect that this has something to do with the alphabet that is being used. I am dealing with proteins (so I use amino acids code letters) but it seems that search is considering nucleotides (the 2 first rows of my matrix are amino acids A and C, and the third is D). My questions are:
- Is there any way to solve this? (I mean, this searching works with amino acids)
- Is Bio.motifs suitable for proteins? (The example in the documentation deals with nucleotides)
- If you do not know to answer the questions above, do you know if there is another way (programmatically) to find motifs in proteins since I have a dataset with motifs patterns that have been proven to being cleaved?