Error when searching for matches using the PSSM score with proteins (.motif BioPython)
1
0
Entering edit mode
5 months ago

Hi everyone! I am making this question because I have been searching for an appropriate answer for a time here and in other resources and I just did not find anything. There is a very similar question not ansewered here: pssm.search from Bio.motifs not working with amino acid sequences. So the problem is: I am trying to find a motif in a set of sequences. With this in mind, I followed the documentation provided in https://biopython.readthedocs.io/en/latest/chapter_motifs.html. But when it comes to search the motif with the PSSM that has been constructed, this error comes:

/usr/local/lib/python3.7/dist-packages/Bio/motifs/matrix.py in __init__(self, alphabet, values)
 37             if self.length is None:
 38                 self.length = len(values[letter])
 39             elif self.length != len(values[letter]): 
 40                 raise Exception("data has inconsistent lengths")
 41             self[letter] = list(values[letter])

  KeyError: 'D'

I suspect that this has something to do with the alphabet that is being used. I am dealing with proteins (so I use amino acids code letters) but it seems that search is considering nucleotides (the 2 first rows of my matrix are amino acids A and C, and the third is D). My questions are:

  • Is there any way to solve this? (I mean, this searching works with amino acids)
  • Is Bio.motifs suitable for proteins? (The example in the documentation deals with nucleotides)
  • If you do not know to answer the questions above, do you know if there is another way (programmatically) to find motifs in proteins since I have a dataset with motifs patterns that have been proven to being cleaved?
protein motif python biopython • 206 views
ADD COMMENT
1
Entering edit mode
5 months ago

Hi! I made this same question in Biopython's github page and the answer is: pssm.search is designed for DNA sequences only. That means the source code needs to be changed in order to deal with proteins. More informations in the link: https://github.com/biopython/biopython/issues/3636

ADD COMMENT

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6