how to get secondary structure from DSSP?
2
1
Entering edit mode
2.9 years ago
Xylanaser ▴ 80

Hey. Is there quick way (biopython or something) to retrive secondary structure from dssp file?

eg 'HHHHHHHHHHEEEEEEEEEEEECCCCCCCCHHHHHHEEEEEEEEE'

I need to get SS by pdb id and build FASTA like records

>3gja
HHHHHHHHHHEEEEEEEEEEEECCCCCCCCHHHHHHEEEEEEEEE
>3akl
CCCCCCCCCCCCEEEEEEEEEEEHHHHHHHHHHEEEEEEEEEEEECCCCCCCCHHHHHHEEEEEEEEE
...

protein dssp pdb structure • 5.8k views
0
Entering edit mode

Hello, did you ever solve this?

3
Entering edit mode
2.9 years ago
Mensur Dlakic ★ 22k

Here is Biopython's DSSP code, with typical use shown near the top. It is not difficult to extract this information on your own by reading contents of column 17 of the DSSP output, following this line:

  #  RESIDUE AA STRUCTURE BP1 BP2  ACC     N-H-->O    O-->H-N    N-H-->O    O-->H-N    TCO  KAPPA ALPHA  PHI   PSI    X-CA   Y-CA   Z-CA


You may want to convert the 8-letter designation utilized by DSSP into 3 letters - space (or empty character '') is considered to be C. Typical conversion schemes are:

SS-Scheme 1: H,G,I->H ; E,B->E ; T,S->C
SS-Scheme 2: H,G->H ; E,B->E ; I,T,S->C    I think this is most common
SS-Scheme 3: H,G->H ; E->E ; I,B,T,S->C
SS-Scheme 4: H->H ; E,B->E ; G,I,T,S->C
SS-Scheme 5: H->H ; E->E ; G,I,B,T,S->C

4
Entering edit mode
16 months ago
Mensur Dlakic ★ 22k

I thought it would be very simple to write a code for this task given clear instructions in the BioPython file I referenced above, but it seems like that's not the case. Here is a working code that will download a structure and extract its DSSP designations. If you already have PDB files somewhere, simply skip the download and point to them in the parsing and DSSP functions.

from Bio.PDB import PDBParser
from Bio.PDB.DSSP import DSSP
from Bio.PDB import PDBList

pdb_dl = PDBList()
pdb_list = ['1ako']
for i in pdb_list:
pdb_dl.retrieve_pdb_file(i, pdir='./', file_format='pdb', overwrite=True)

# parse structure
p = PDBParser()
for i in pdb_list:
structure = p.get_structure(i, './pdb%s.ent' % i)
# use only the first model
model = structure[0]
# calculate DSSP
dssp = DSSP(model, './pdb%s.ent' % i, file_type='PDB')
# extract sequence and secondary structure from the DSSP tuple
sequence = ''
sec_structure = ''
for z in range(len(dssp)):
a_key = list(dssp.keys())[z]
sequence += dssp[a_key][1]
sec_structure += dssp[a_key][2]

# print extracted sequence and structure
print(i)
print(sequence)
print(sec_structure)
#
# The DSSP codes for secondary structure used here are:
# =====     ====
# Code      Structure
# =====     ====
# H         Alpha helix (4-12)
# B         Isolated beta-bridge residue
# E         Strand
# G         3-10 helix
# I         Pi helix
# T         Turn
# S         Bend
# -         None
# =====     ====
#
# if desired, convert DSSP's 8-state assignments into 3-state [C - coil, E - extended (beta-strand), H - helix]
sec_structure = sec_structure.replace('-', 'C')
sec_structure = sec_structure.replace('I', 'C')
sec_structure = sec_structure.replace('T', 'C')
sec_structure = sec_structure.replace('S', 'C')
sec_structure = sec_structure.replace('G', 'H')
sec_structure = sec_structure.replace('B', 'E')
print(sec_structure)


The printout will contain a protein sequence, its original DSSP designation, and converted DSSP assignments after replacing 8-state with 3-state characters.

1ako
-EEEEEE-S-GGG-HHHHHHHHHHH--SEEEEE-----GGG--HHHHHHTT-EEEEEEETTEEEEEEEESS--SEEEESSTT--HHHHTTEEEEEEEETTEEEEEEEEE-----BTT-TTHHHHHHHHHHHHHHHHHHH--TTS-EEEEEE-----SGGGB-S-HHHHHHHHHHTBTTS-HHHHHHHHHHHHTTEEEHHHHHSTT--S--SB--TTTTHHHHT--B--EEEEEEHHHHTTEEEEEE-HHHHTSSS--SB--EEEEE--
CEEEEEECCCHHHCHHHHHHHHHHHCCCEEEEECCCCCHHHCCHHHHHHCCCEEEEEEECCEEEEEEEECCCCCEEEECCCCCCHHHHCCEEEEEEEECCEEEEEEEEECCCCCECCCCCHHHHHHHHHHHHHHHHHHHCCCCCCEEEEEECCCCCCHHHECCCHHHHHHHHHHCECCCCHHHHHHHHHHHHCCEEEHHHHHCCCCCCCCCECCCCCCHHHHCCCECCEEEEEEHHHHCCEEEEEECHHHHCCCCCCCECCEEEEECC

0
Entering edit mode

thank you kindly for this update. I am trying it now with my list of PDB IDs, but I have a question, if I have a specific chain I would like to specify, how would I go about that?

0
Entering edit mode

It says in DSSP.py:

**Note** that DSSP can only handle one model, and will only run
calculations on the first model in the provided PDB file.


For multiple chains, you either need to know the number of your chain and enter it in this line:

model = structure[0]


I think chain A is structure[0], B is structure[1] and so forth. But this is something you need to verify.

0
Entering edit mode

I am calling a text file of PDB IDs, that are separated by a new line, I can also do so with a comma if thats what the code needs, but its not specified. Anyways, I am getting this error:

FileNotFoundError: [Errno 2] No such file or directory: './pdb/Users/data/Desktop/PDB/pdblist.ent'


It seems like it is adding on those parameters to the file name, hence why it is not being found.

# initialize PDB downloader
pdb_dl = PDBList()
pdb_list = ['/Users/fcatalogne/Desktop/PDB/pdblist']
for i in pdb_list:
pdb_dl.retrieve_pdb_file(i, pdir='./', file_format='pdb', overwrite=True)

# parse structure
p = PDBParser()
for i in pdb_list:
structure = p.get_structure(i, './pdb%s.ent' % i)
# use only the first model
model = structure[0]
# calculate DSSP
dssp = DSSP(model, './pdb%s.ent' % I)....."

1
Entering edit mode

You need to know python basics to be able to do this. If you don't, I suggest you get acquainted with the language first, as I am not able to troubleshoot your errors line by line.

In this particular case, a list of PDB IDs is needed, like this:

['1ako', '4e5w', '3cro', '2f2f']


What you are trying to do is point at a list of PDB files on your disk, which won't work here. It is possible to process a bunch of files that are already on your disk, but that requires modifications to my code. I am afraid you will have to figure it out on your own. You have guidelines in my code, it is simply a matter of opening a list of existing PDB files one by one.

0
Entering edit mode

Yes, even with this, I am getting a "Desired structure doesn't exists" error. I placed the actual PDB ID in the parameter as well as placed in in a text file as a list of strings, still the problem lies, I googled and there was something about VPN, but I am not using any, have you ever heard of this? GenoMax if you have heard of this either?

 Downloading PDB structure '/Users/user/Desktop/PDB/mockpdb.txt/'...
Desired structure doesn't exists
['/Users/user/Desktop/PDB/mockpdb.txt/']


the two mock PDB IDs I am using: [“6x17”, “7lol”] I checked, both are HTTP 200

WAIT, must it be lower-case letters? nope, same error

0
Entering edit mode

Again, the list of PDB IDs needs to be in the format I specified, not a text file containing the list.

As to the list thing you tried, it works just fine (see the script output below for ['7lol', '6x17']). That means you need to do some troubleshooting regarding the download. I don't want to sound harsh, but this problem requires that you have some basic knowledge about (bio)python, DSSP, PDB structure formats, etc. It is impossible to troubleshoot every single error or problem you might encounter without a massive time investment on our part.

Downloading PDB structure '7lol'...
7lol
--S-S--SGGG--B-S-GGGSS-SEEEEE-------SS----S-HHHHHHHHTHHHHT-SS-TT-SS-HHHH--EEEEEE-GGGGS-HHHHHHHHHHHHHHHHHTT-EEEEE-SSGGGHHHHHHHHHHHH-SBEEEEE-SS-----SS-TT-TT-HHHHHHHTTSB-GGG-EEES--S---TTSS-EEE-HHHHHHS-HHHHHHHHHHHHTTS-EEEEEEGGGB-TTT---SSS--SS-B-HHHHHHHHHHTTTS-EEEEEEE---GGG-STTHHHHHHHHHHHHHHHHHHTT-
CCCCCCCCHHHCCECCCHHHCCCCEEEEECCCCCCCCCCCCCCCHHHHHHHHCHHHHCCCCCCCCCCCHHHHCCEEEEEECHHHHCCHHHHHHHHHHHHHHHHHCCCEEEEECCCHHHHHHHHHHHHHHHCCEEEEEECCCCCCCCCCCCCCCCCHHHHHHHCCCECHHHCEEECCCCCCCCCCCCEEECHHHHHHCCHHHHHHHHHHHHCCCCEEEEEEHHHECCCCCCCCCCCCCCCECHHHHHHHHHHCCCCCEEEEEEECCCHHHCCCCHHHHHHHHHHHHHHHHHHCCC
/usr/local/lib/python2.7/dist-packages/Bio/PDB/StructureBuilder.py:91: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 9671.
PDBConstructionWarning)
/usr/local/lib/python2.7/dist-packages/Bio/PDB/StructureBuilder.py:91: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 9714.
PDBConstructionWarning)
/usr/local/lib/python2.7/dist-packages/Bio/PDB/StructureBuilder.py:91: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 9757.
PDBConstructionWarning)
/usr/local/lib/python2.7/dist-packages/Bio/PDB/StructureBuilder.py:91: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 9800.
PDBConstructionWarning)
/usr/local/lib/python2.7/dist-packages/Bio/PDB/StructureBuilder.py:91: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 9801.
PDBConstructionWarning)
/usr/local/lib/python2.7/dist-packages/Bio/PDB/StructureBuilder.py:91: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 9802.
PDBConstructionWarning)
6x17
-HHHHHHHS-HHHHHHHHHHHHHHHHHHHHHTT-HHHHHHHTHHHHHHHHHHHHHTHHHHHHHHHHTTT-S-TTTHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHS-----------S----------HHHHHHSSS-S-HHHHHHS--HHHHHHHHHHHHHHHHHHTTSS-SSHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHSSSTTSSSHHHHHHHHHHHHHHHHHHHHHHHHHHTT--HHHHHHHHHHHHHHHHHH--SSTTHHHHHHHHHTTT--HHHHHHHHHHHTTS--HHHHHHHHHHHHHHHHHTT----TT-HHHHHHHHHHHHHHSSS-SSHHHHHHHHHHHHHT--TTSHHHHHHHHHHGGGHHHHHHHHHHHHHHHHHHHHHHHHHH--HHHHHHHS-HHHHHHHHHHHHHHHHHHHHHTT-HHHHHHHTHHHHHHHHHHHHHTHHHHHHHHHHTTT-S-TTTHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHS-----------S----------HHHHHHSSS-S-HHHHHHS--HHHHHHHHHHHHHHHHHHTTSS-SSHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHSSSTTSSSHHHHHHHHHHHHHHHHHHHHHHHHHHTT--HHHHHHHHHHHHHHHHHH--SSTTHHHHHHHHHTTT--HHHHHHHHHHHTTS--HHHHHHHHHHHHHHHHHTT----TT-HHHHHHHHHHHHHHSSS-SSHHHHHHHHHHHHHT--TTSHHHHHHHHHHGGGHHHHHHHHHHHHHHHHHHHHHHHHHH--HHHHHHHS-HHHHHHHHHHHHHHHHHHHHHTT-HHHHHHHTHHHHHHHHHHHHHTHHHHHHHHHHTTT-S-TTTHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHS-----------S----------HHHHHHSSS-S-HHHHHHS--HHHHHHHHHHHHHHHHHHTTSS-SSHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHSSSTTSSSHHHHHHHHHHHHHHHHHHHHHHHHHHTT--HHHHHHHHHHHHHHHHHH--SSTTHHHHHHHHHTTT--HHHHHHHHHHHTTS--HHHHHHHHHHHHHHHHHTT----TT-HHHHHHHHHHHHHHSSS-SSHHHHHHHHHHHHHT--TTSHHHHHHHHHHGGGHHHHHHHHHHHHHHHHHHHHHHHHHH-
CHHHHHHHCCHHHHHHHHHHHHHHHHHHHHHCCCHHHHHHHCHHHHHHHHHHHHHCHHHHHHHHHHCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCHHHHHHCCCCCCHHHHHHCCCHHHHHHHHHHHHHHHHHHCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHCCCCCCHHHHHHHHHCCCCCHHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHCCCCCCCCCHHHHHHHHHHHHHHCCCCCCHHHHHHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCHHHHHHHCCHHHHHHHHHHHHHHHHHHHHHCCCHHHHHHHCHHHHHHHHHHHHHCHHHHHHHHHHCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCHHHHHHCCCCCCHHHHHHCCCHHHHHHHHHHHHHHHHHHCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHCCCCCCHHHHHHHHHCCCCCHHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHCCCCCCCCCHHHHHHHHHHHHHHCCCCCCHHHHHHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCHHHHHHHCCHHHHHHHHHHHHHHHHHHHHHCCCHHHHHHHCHHHHHHHHHHHHHCHHHHHHHHHHCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCHHHHHHCCCCCCHHHHHHCCCHHHHHHHHHHHHHHHHHHCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHCCCCCCHHHHHHHHHCCCCCHHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHCCCCCCCCCHHHHHHHHHHHHHHCCCCCCHHHHHHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHC

0
Entering edit mode

Hello, Mensur

Thanks so much for your help. However, when I follow exactly your coding, I met a bug that I can't fix. I wonder if you can give me some advice. The error happens everytime I use DSSP.

dssp = DSSP(model, './pdb%s.ent' % i)


AssertionError                            Traceback (most recent call last)
<ipython-input-58-45292e17bc7e> in <module>
15     model = structure[0]
16     # calculate DSSP
---> 17     dssp = DSSP(model, './pdb%s.ent' % i)

~\anaconda3\lib\site-packages\Bio\PDB\DSSP.py in __init__(self, model, in_file, dssp, acc_array, file_type)
429             "MMCIF",
430             "DSSP",
--> 431         ], "File type must be PDB, mmCIF or DSSP"
432         # If the input file is a PDB or mmCIF file run DSSP and parse output:
433         if file_type == "PDB" or file_type == "MMCIF":

AssertionError: File type must be PDB, mmCIF or DSSP


This is really confusing because the file I use is pdb.ent file of the corresponding protein. My entire coding is below (copy your code):

from Bio.PDB import PDBList
pdb_dl = PDBList()
pdb_list = ['10gs']
for i in pdb_list:
pdb_dl.retrieve_pdb_file(i, pdir='./', file_format='pdb', overwrite=True)

# parse structure
p = PDBParser()
for i in pdb_list:
structure = p.get_structure(i, './pdb%s.ent' % i)
# use only the first model
model = structure[0]
# calculate DSSP
dssp = DSSP(model, './pdb%s.ent' % i)

0
Entering edit mode

I think this has something to do with newest BioPython version, as the parsing may have changed in BioPython version 1.79. When I run your code with python3 and v1.79, the same error message pops up as with you. When using python2 and v1.76, the code goes without a problem. I suggest you try python2, or maybe python3 but with earlier BioPython version.

Downloading PDB structure '10gs'...
/usr/local/lib/python2.7/dist-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3678.
PDBConstructionWarning,
/usr/local/lib/python2.7/dist-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3723.
PDBConstructionWarning,
/usr/local/lib/python2.7/dist-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3768.
PDBConstructionWarning,
/usr/local/lib/python2.7/dist-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3861.
PDBConstructionWarning,
10gs
-EEEEE-SS-GGGHHHHHHHHHTT--EEEEE--HHHHHTSHHHHHSTTS-S-EEEETTEEEESHHHHHHHHHHHHT-S-SSHHHHHHHHHHHHHHHHHHHHHHHHHHH-HHHHHHHHHHHHHHHHHHHHHHHHTTGGGTS-SSTTS--HHHHHHHHHHHHHHHHSTTTTTT-HHHHHHHHHHHTSHHHHHHHHSHHHHTS-SSTTS---EEEEE-SS-GGGHHHHHHHHHTT--EEEEE--HHHHHTSHHHHHSTTS-S-EEEETTEEEESHHHHHHHHHHHHT-S-SSHHHHHHHHHHHHHHHHHHHHHHHHHHH-HHHHHHHHHHHHHHHHHHHHHHHHTTGGGTS-SSTTS--HHHHHHHHHHHHHHHHSTTTTTT-HHHHHHHHHHHTSHHHHHHHHSHHHHTS-SSTTS----
CEEEEECCCCHHHHHHHHHHHHCCCCEEEEECCHHHHHCCHHHHHCCCCCCCEEEECCEEEECHHHHHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHCHHHHHHHHHHHHHHHHHHHHHHHHCCHHHCCCCCCCCCCHHHHHHHHHHHHHHHHCCCCCCCCHHHHHHHHHHHCCHHHHHHHHCHHHHCCCCCCCCCCCEEEEECCCCHHHHHHHHHHHHCCCCEEEEECCHHHHHCCHHHHHCCCCCCCEEEECCEEEECHHHHHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHCHHHHHHHHHHHHHHHHHHHHHHHHCCHHHCCCCCCCCCCHHHHHHHHHHHHHHHHCCCCCCCCHHHHHHHHHHHCCHHHHHHHHCHHHHCCCCCCCCCCCC

0
Entering edit mode

I found a solution that works with python3. One line needs to be changed.

From:

dssp = DSSP(model, './pdb%s.ent' % i)


To:

dssp = DSSP(model, './pdb%s.ent' % i, file_type='PDB')

0
Entering edit mode

Hello, Mensur

Thanks so much for your help. It took me some time to set up the Python 2.7 environments. However, I met an error

WindowsError: [Error 2] The system cannot find the file specified


when I using

dssp = DSSP(model, path)


in python 2.7 and biopython v1.76

AND using

dssp = DSSP(model, path, file_type='PDB')


in python 3

I know this means the directory of file location is wrong. However, it works perfect when in command

structure = p.get_structure(i, path)


Wouldn't the paths are exactly the same?

and I also use readlines() to check the directory. This is right directory.

Original coding

p = PDBParser()
for i in pdb_list:
path = './pdb%s.ent' % i
#print(path)
structure = p.get_structure(i, path)
# use only the first model
model = structure[0]
# checking the path
#with open(path) as f:
#    print(lines)
# calculate DSSP
dssp = DSSP(model, path)


error

WindowsErrorTraceback (most recent call last)
<ipython-input-22-457e24ca13dc> in <module>()
20     #    print(lines)
21     # calculate DSSP
---> 22     dssp = DSSP(model, path)

C:\Users\amber\anaconda3\envs\py27\lib\site-packages\Bio\PDB\DSSP.pyc in __init__(self, model, in_file, dssp, acc_array, file_type)
436                 else:
437                     raise
--> 438             dssp_dict, dssp_keys = dssp_dict_from_pdb_file(in_file, dssp)
439         # If the input file is a DSSP file just parse it directly:
440         elif file_type == "DSSP":

C:\Users\amber\anaconda3\envs\py27\lib\site-packages\Bio\PDB\DSSP.pyc in dssp_dict_from_pdb_file(in_file, DSSP)
234             universal_newlines=True,
235             stdout=subprocess.PIPE,
--> 236             stderr=subprocess.PIPE,
237         )
238     except OSError:  # TODO: Use FileNotFoundError once drop Python 2

C:\Users\amber\anaconda3\envs\py27\lib\subprocess.pyc in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
395         except Exception:
396             # Preserve original exception in case os.close raises.

C:\Users\amber\anaconda3\envs\py27\lib\subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
642                                          env,
643                                          cwd,
--> 644                                          startupinfo)
645             except pywintypes.error, e:
646                 # Translate pywintypes.error to WindowsError, which is

WindowsError: [Error 2] The system cannot find the file specified


I am sorry for the stupid question. I can't fix it.

0
Entering edit mode

I don't know how to troubleshoot python under Windows. I already gave you a solution above that should work for python3.

0
Entering edit mode

Thanks for your help. I'll try my best. Thanks again.

0
Entering edit mode

Hello, Mensur

sorry for interrpt you again. I finally solved my question. It's caused by my unfamiliarity with Biopython package, especially the DSSP program (it's my first time using it). Thanks again for your help.