Question

pdb to dssp

0

Entering edit mode

7.2 years ago

uday4vijay • 0

I have downloaded RS126 dataset and when I extracted it, I got a bunch of files with .pdb extension. I dont know how to open these files. My intention is I want to give these files as input to neural network in MATLAB for protein secondary structure prediction. I found in some papers that before giving the sequence as input it must be encoded into matrix format. Could anyone please let me know how to do it in MATLAB?

Any help is appreciated. Thanks in advance.

sequence • 3.3k views

ADD COMMENT • link updated 7.2 years ago by Petr Ponomarenko ★ 2.8k • written 7.2 years ago by uday4vijay • 0

0

Entering edit mode

.pdb files should be plain text. Use/open in any text editor.

ADD REPLY • link 7.2 years ago by GenoMax 141k

score 0 · Answer 1 · 2017-03-01

0

Entering edit mode

7.2 years ago

Petr Ponomarenko ★ 2.8k

pdb to dssp is provided by dssp http://www.cmbi.ru.nl/dssp.html .

I am a bit confused with your question. What do you want us to help you with?

Help you understand pdb file format? genomax2 provided you to the documation and there is secondary structure section in it.

Translate information into a matrix for analysis? For this we need clear understanding of the output you matrix you want with input/output example.

Do you want us to make and run a neural network in MATLAB for you? This is most likely beyond the scope of for question/answers.

ADD COMMENT • link 7.2 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

I would like to know if I need to pre-process the data before giving it as input to the network and if yes, how do I need to do it. For example, convert it to matrix form and give the matrix as input to the neural network. How can I do that?

Thanks.

ADD REPLY • link 7.2 years ago by uday4vijay • 0

0

Entering edit mode

You can start with a simple model of secondary structure made of 3 types of elements: alpha helixes, b-sheets and everything else. Represent secondary structure as a string that goes together with the peptide sequence ------AAAAA-----BBBBB- Both alpha helix and beta sheet are local structures that depend on local sequence, so you do not need a super complicated model. You can start from predicting alpha helixes since it is very much defined by very short patterns in sequence (learning about secondary structures and folding will help you). If you want to use neural networks for some reason, you can give as input part of peptide sequences that form alpha helixes as one set and a part of the set of sequences forming alpha helixes to train your network to distinguish between the two. The leftover sequences can be used to test your network. Then you can add another block to your network to distinguish between beta sheet and everything else that is also not alpha helices. You can improve your network that way module by module. You may want to first reduce subsequence length dependence by starting with parts of same of same length

ADD REPLY • link 7.2 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

I'm just wondering why you need to do this? Is it just because you can/to educate yourself (which is fine). Secondary structure prediction is pretty good these days already...

Also, I can't help but think that if these questions are already causing you issues, how do you plan to implement something as complicated as a neural network?

I don't mean to sound harsh - these just seem like some very early/simple stumbling blocks to be having?

ADD REPLY • link 7.2 years ago by Joe 21k