Error when searching for matches using the PSSM score with proteins (.motif BioPython)
1
0
Entering edit mode
2.8 years ago
Agenor Neto ▴ 10

Hi everyone! I am making this question because I have been searching for an appropriate answer for a time here and in other resources and I just did not find anything. There is a very similar question not ansewered here: pssm.search from Bio.motifs not working with amino acid sequences. So the problem is: I am trying to find a motif in a set of sequences. With this in mind, I followed the documentation provided in https://biopython.readthedocs.io/en/latest/chapter_motifs.html. But when it comes to search the motif with the PSSM that has been constructed, this error comes:

/usr/local/lib/python3.7/dist-packages/Bio/motifs/matrix.py in __init__(self, alphabet, values)
 37             if self.length is None:
 38                 self.length = len(values[letter])
 39             elif self.length != len(values[letter]): 
 40                 raise Exception("data has inconsistent lengths")
 41             self[letter] = list(values[letter])

  KeyError: 'D'

I suspect that this has something to do with the alphabet that is being used. I am dealing with proteins (so I use amino acids code letters) but it seems that search is considering nucleotides (the 2 first rows of my matrix are amino acids A and C, and the third is D). My questions are:

  • Is there any way to solve this? (I mean, this searching works with amino acids)
  • Is Bio.motifs suitable for proteins? (The example in the documentation deals with nucleotides)
  • If you do not know to answer the questions above, do you know if there is another way (programmatically) to find motifs in proteins since I have a dataset with motifs patterns that have been proven to being cleaved?
protein motif python biopython • 962 views
ADD COMMENT
1
Entering edit mode
2.8 years ago
Agenor Neto ▴ 10

Hi! I made this same question in Biopython's github page and the answer is: pssm.search is designed for DNA sequences only. That means the source code needs to be changed in order to deal with proteins. More informations in the link: https://github.com/biopython/biopython/issues/3636

ADD COMMENT
0
Entering edit mode

Hello, have you been able to modify matrix.py file to include non-DNA characters? Your help will be much appreciated. Regards

ADD REPLY
0
Entering edit mode

Hi Mostafa! I solved this problem using one module in Biopython, you can get it using from Bio.Data import IUPACData and then, when you are creating your motif object you define the letters you are going to use like m = motifs.create(instances, alphabet=IUPACData.protein_letters) . This has worked for me while creating a Position Scoring Matrix. I hope it works for you!

ADD REPLY

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6