Hi everyone! I am making this question because I have been searching for an appropriate answer for a time here and in other resources and I just did not find anything. There is a very similar question not ansewered here: pssm.search from Bio.motifs not working with amino acid sequences. So the problem is: I am trying to find a motif in a set of sequences. With this in mind, I followed the documentation provided in https://biopython.readthedocs.io/en/latest/chapter_motifs.html. But when it comes to search the motif with the PSSM that has been constructed, this error comes:
/usr/local/lib/python3.7/dist-packages/Bio/motifs/matrix.py in __init__(self, alphabet, values)
37 if self.length is None:
38 self.length = len(values[letter])
39 elif self.length != len(values[letter]):
40 raise Exception("data has inconsistent lengths")
41 self[letter] = list(values[letter])
KeyError: 'D'
I suspect that this has something to do with the alphabet that is being used. I am dealing with proteins (so I use amino acids code letters) but it seems that search is considering nucleotides (the 2 first rows of my matrix are amino acids A and C, and the third is D). My questions are:
- Is there any way to solve this? (I mean, this searching works with amino acids)
- Is Bio.motifs suitable for proteins? (The example in the documentation deals with nucleotides)
- If you do not know to answer the questions above, do you know if there is another way (programmatically) to find motifs in proteins since I have a dataset with motifs patterns that have been proven to being cleaved?
Hello, have you been able to modify matrix.py file to include non-DNA characters? Your help will be much appreciated. Regards
Hi Mostafa! I solved this problem using one module in Biopython, you can get it using
from Bio.Data import IUPACData
and then, when you are creating your motif object you define the letters you are going to use likem = motifs.create(instances, alphabet=IUPACData.protein_letters)
. This has worked for me while creating a Position Scoring Matrix. I hope it works for you!