Biopython to compute the distance between residues
1
0
Entering edit mode
2.0 years ago
lisa • 0

Hello,

I am trying to use Biopython to find the distance between the alpha carbons for a pair of residues in a model:

from Bio.PDB.PDBParser import PDBParser
parser = PDBParser()
structure = parser.get_structure("4HHB", "4HHB.pdb")
# Create a list of all the residues
residues = []
for model in structure.get_list():
for chain in model.get_list():
for residue in chain.get_list():
residues.append(residue)
# Calculate the distance between the alpha carbons for a pair of residues
firstResidue = residues[0]
secondResidue = residues[1]
diff = firstResidue["CA"].get_coord() - secondResidue["CA"].get_coord()
distance = numpy.sqrt(numpy.sum(diff * diff))


However, I get the error, KeyError: 'CA'. I'm new to Biopython and was hoping someone with more experience can help me understand how to correctly reference atoms in the residues.

Thank you.

Biopython • 3.0k views
ADD COMMENT
0
Entering edit mode

Hello lisa,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.

Also, post your complete code here with some background information about the input; best way is to host the file somewhere and provide a link for that here at biostars or provide first few/ relevant lines here.

Thank you!

ADD REPLY
0
Entering edit mode

Hello, thank you for your response. I have updated my question to include the complete code.

ADD REPLY
0
Entering edit mode

Can you please post the exact error? I can’t see anything obviously wrong with your approach so far.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
2.0 years ago
Joe 19k

I can’t see where the exact error is coming from, but from playing with it, it seems that firstResidue and secondResidue aren’t what they seem to be. I think it has something to do with your use of get_list(), which isn’t the ‘normal’ way of dealing with these objects in my experience.

This code works (I haven’t checked the maths though). I’m using a dummy structure (1A80). See a similar question/answer here: Creating a distance matrix from PDB file coordinates (Python)

import numpy
from Bio.PDB.PDBParser import PDBParser

parser = PDBParser()
structure = parser.get_structure("1A80", "1A80.pdb")

residues = [r for r in structure.get_residues()]        # You don’t need the extensive list traversal you were doing before because there are many get_*() methods
# Calculate the distance between the alpha carbons for a pair of residues
one  = residues[0]["CA"].get_coord()
two = residues[1]["CA"].get_coord()
print('{} - {} = {}'.format(one,two, numpy.linalg.norm(one-two)))


A linear algebra ‘normal’ should be equivalent to the Euclidean distance between the arrays, but check my maths by all means.

I get ~4A, which sounds sensible though:

[-3.606  8.443  4.224] - [-3.507  6.781  7.627] = 3.78846311569

ADD COMMENT
0
Entering edit mode

Hello, thank you for your help. The residues[0]["CA"].get_coord() works on its own. It seems the key error arises when I try to do this for all the residues. For example:

residues = [r for r in structure.get_residues()]
distances = numpy.empty(len(residues), len(residues))
for x in range(len(residues)):
for y in range(len(residues)):
one = residues[x]["CA].get_coord()
two = residues[y]["CA"].get_coord()
distances[x, y] =  numpy.linalg.norm(one-two)

ADD REPLY
0
Entering edit mode

Fairly sure this is caused by non-amino acid elements in the list as they wont have alpha carbons.

Try the following:

import numpy
import itertools
from Bio.PDB.PDBParser import PDBParser

parser = PDBParser()
structure = parser.get_structure("1A80", "1A80.pdb")

residues = [r for r in structure.get_residues() if r.get_id()[0] == " "]
for each in itertools.combinations(residues, 2):
one  = each[0]["CA"].get_coord()
two = each[1]["CA"].get_coord()
print('{} - {} = {}'.format(one,two, numpy.linalg.norm(one-two)))

ADD REPLY

Login before adding your answer.

Traffic: 1915 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6