Biopython SASA looks wrong and disagrees with and Pymol SASA
2
0
Entering edit mode
2.8 years ago
tomwr1ght • 0

I'm trying to calculate solvent accessible surface area using Biopython.

However, the values from biopython do not make sense when looking at the crystal model visually in Pymol. Furthermore, when I calculate the SASA using Pymol, those values match well visually, but do not agree with the biopython values.

Why is biopython producing supposedly wrong SASA values?

Here are the values given by Biopython (~180 is max):

[('GLY', 3.4), ('ILE', 0.0), ('VAL', 0.0), ('GLU', 4.25), ('GLN', 0.85), ('CYS', 0.0), ('CYS', 0.0), ('THR', 45.57), ('SER', 55.2), ('ILE', 42.79)]

You can see that Gly1 has a tiny value. However, looking at the model visually (see picture, surface in blue), Gly 1 has a lot of exposed surface.

enter image description here

Also, the calculated value using pymol is 67.44. This shows some exposure. Why is the biopython code so different and seemingly wrong?

Biopython code:

from Bio.PDB.SASA import ShrakeRupley

from Bio.PDB.MMCIFParser import MMCIFParser

parser = MMCIFParser(QUIET=True)
structure = parser.get_structure("HELLO", "insulin.cif")

sr = ShrakeRupley()

sr.compute(structure[0], level="R")

my_list = []
for chain in structure[0]:
    for res in chain:
        my_list.append((res.get_resname(),round(res.sasa,2)))

print(my_list[0:10])
biopython python pymol • 3.6k views
ADD COMMENT
2
Entering edit mode
2.8 years ago
Mensur Dlakic ★ 28k

I don't know how BioPython or PyMol calculate SASA, but any comparison would require knowing both of the underlying algorithms. The algorithm I know about, and it has been around for many decades, is implemented in dssp. It expresses accessibility as an area in angstroms squared, where the maximum exposure for each amino acid is as follows (Rost & Sander 1994, https://onlinelibrary.wiley.com/doi/10.1002/prot.340200303):

A 106
C 135
D 163
E 194
F 197
G 84
H 184
I 169
K 205
L 164
M 188
N 157
P 136
Q 198
R 248
S 130
T 142
V 142
W 227
Y 222

The SASA value is under the column ACC:

  #  RESIDUE AA STRUCTURE BP1 BP2  ACC     N-H-->O    O-->H-N    N-H-->O    O-->H-N    TCO  KAPPA ALPHA  PHI   PSI    X-CA   Y-CA   Z-CA
    1    1 A M              0   0   67      0, 0.0   265,-2.4     0, 0.0     2,-0.4   0.000 360.0 360.0 360.0 117.0   21.2   23.0   18.3
    2    2 A K  E     -A  265   0A  35    263,-0.2    27,-1.8    24,-0.2    26,-1.5  -0.889 360.0-164.9-104.4 134.9   23.1   20.0   16.9
    3    3 A F  E     -Ab 264  29A   1    261,-2.7   261,-2.0    -2,-0.4     2,-0.4  -0.934   3.6-170.9-118.5 137.9   25.0   20.0   13.7
    4    4 A V  E     -Ab 263  30A   1     25,-2.1    27,-2.8    -2,-0.4     2,-0.4  -0.988   3.5-165.0-129.0 134.2   26.3   17.1   11.7
    5    5 A S  E     +Ab 262  31A   0    257,-2.9   257,-2.5    -2,-0.4     2,-0.3  -0.977  14.2 173.5-116.9 135.8   28.7   17.3    8.7
    6    6 A F  E     - b   0  32A   0     25,-2.6    27,-1.8    -2,-0.4     2,-1.1  -0.944  31.5-151.3-151.1 120.8   29.1   14.2    6.5
    7    7 A N  E     - b   0  33A   0     -2,-0.3   252,-0.5    25,-0.2    27,-0.2  -0.838  23.5-177.3 -87.8 101.4   30.9   13.7    3.2
    8    8 A I        -     0   0    0     25,-1.4    -1,-0.2    -2,-1.1    26,-0.2   0.648  21.2-146.1 -78.3 -16.9   28.6   10.9    2.0
    9    9 A N  S    S+     0   0   16     24,-1.0    26,-0.2   248,-0.1    25,-0.1   0.952  82.6  12.6  47.1  62.2   30.5   10.3   -1.2
   10   10 A G    >>  -     0   0   12     24,-2.7     4,-1.0    23,-0.2     3,-0.6   0.766  69.0-159.2 105.1  82.3   27.5    9.4   -3.2

Maybe using dssp on your protein outside of python will give you a baseline for further comparisons. These are the values BioPython outputs for the first 10 residues of the same structure as above:

15.5
21.5
1.4
2.3
0.0
0.0
0.0
0.0
13.7
2.2

And you observed, these are all generally lower than values calculated by dssp. It is not a perfect correlation, but it seems like these may be relative accessibility numbers (in percent) rather than absolute values.

ADD COMMENT
2
Entering edit mode

BioPython can use dssp to calculate SASA - see here. It outputs relative accessibility, so you would also have to read in the residue and multiply by the numbers I posted above.

These are relative values (in percent) for the same 10 residues as above, and they are in the ballpark:

35.6
17.1
0.5
0.7
0.0
0.0
0.0
0.0
10.2
14.3
ADD REPLY
0
Entering edit mode

Hi Menur Dlakic,

Wow, thank you so much for your detailed help. Given that DSSP using Biopython seems a good bet I'll proceed with that.

Thanks again

ADD REPLY
1
Entering edit mode
2.7 years ago
laucalvet89 ▴ 10

I believe the problem is that you have included water molecules in the biopython calculation. Here is an example of code that should work:

from Bio.PDB import *
from Bio.PDB.Residue import Residue
from Bio.PDB.Atom import Atom
from Bio.PDB.Chain import Chain
from Bio.PDB.Structure import Structure
from Bio.PDB.Model import Model
from Bio.PDB.PDBList import PDBList
from Bio.PDB.SASA import ShrakeRupley

downloader=PDBList()
pdb='3I40'
my_file=downloader.retrieve_pdb_file(pdb_code=pdb)
parser = MMCIFParser()

structure = parser.get_structure("3I40", my_file)

chain1name='A'
mod_ch1=Model(1)

chain1 = Chain(chain1name)

ch1_struct=Structure("ch1")

num_count=0
for resi_ind, resi in enumerate(structure[0][chain1name].get_residues()):

    res_id = resi.get_full_id()
    resi_name=resi.get_resname()
    residue = Residue(res_id,resi_name,' ')
    if resi_name !='HOH':
        for at in structure[0][chain1name][res_id[3]].get_atoms():
            residue.add(at)
        chain1.add(residue)
mod_ch1.add(chain1)
ch1_struct.add(mod_ch1)

sr = ShrakeRupley()
sr.compute(ch1_struct[1], level="R")
my_list = []

for res in ch1_struct[1][chain1name]:
    my_list.append((res.get_resname(),round(res.sasa,2)))

print(my_list)

which gives: [('GLY', 58.62), ('ILE', 69.59), ('VAL', 72.46), ('GLU', 99.6), ('GLN', 73.04), ('CYS', 34.79), ('CYS', 102.89), ('THR', 115.84), ('SER', 50.84), ('ILE', 150.19),

ADD COMMENT

Login before adding your answer.

Traffic: 3399 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6