Question

Is the definition of the secondary_structure_fraction function wrong?

0

Entering edit mode

3.2 years ago

renjing679 • 0

Personally I think amino acids E, M, A, and L are commonly found in helix and V, I, Y, F, W, L in sheet, but the biopython package assumes that E, M, A, L are in sheet while V, I, Y, F, W, L in helix. Which one is correct?

To be specific, I'm working on a research project about enzyme and want to use biopython package to compute secondary structure fraction. I looked at the source code (https://github.com/biopython/biopython/blob/master/Bio/SeqUtils/ProtParam.py) and here's how secondary structure fraction is defined by biopython package:

def secondary_structure_fraction(self):
      """
      Calculate fraction of helix, turn and sheet.
      Returns a list of the fraction of amino acids which tend to be in Helix, Turn or Sheet.
      Amino acids in helix: V, I, Y, F, W, L.
      Amino acids in Turn: N, P, G, S.
      Amino acids in sheet: E, M, A, L.
      Returns a tuple of three floats (Helix, Turn, Sheet).
      """
      aa_percentages = self.get_amino_acids_percent()
      helix = sum(aa_percentages[r] for r in "VIYFWL")
      turn = sum(aa_percentages[r] for r in "NPGS")
      sheet = sum(aa_percentages[r] for r in "EMAL")
      return helix, turn, sheet

I looked at the webpage here https://biopython.org/docs/1.76/api/Bio.SeqUtils.ProtParam.html but no reference is listed to support why biopython defines it in this way. So I searched literature and textbooks, and surprisingly found that amino acids E, M, A, and L are commonly found in helix while V, I, Y, F, W and L are prevalent in sheet. If so, then I assume the code should be modified as follows:

def secondary_structure_fraction(self):
      aa_percentages = self.get_amino_acids_percent()
      helix = sum(aa_percentages[r] for r in "EMAL")
      turn = sum(aa_percentages[r] for r in "NPGS")
      sheet = sum(aa_percentages[r] for r in "VIYFWL")
      return helix, turn, sheet

I'm new to protein science and not sure which one is correct. Does anyone have an idea?

Biopython • 1.1k views

ADD COMMENT • link 3.2 years ago by renjing679 • 0

score 0 · Answer 1 · 2022-06-13

Not sure whether you are interpreting the code correctly because the code context is missing, but you are correct in thinking that EMAL are more prevalent in helices while VIYFWL are prevalent in strands/sheets.

Here is a table of amino-acid propensities that I calculated from protein structures that were available in 2012-2013.

enter image description here

With all that said, I don't know why you would want to calculate fractions in such a crude way when there are easier and better ways. I suggest you go to the following website and enter your sequence:

https://embed.protein.properties/

That will give you a very accurate secondary structure prediction, from which it is easy to calculate fractions if that's what you need.