Computing percentage of secondary structured and disordered residues from two text files..
0
0
Entering edit mode
7.8 years ago

Hi Friends... I am just a beginner in Python...I want to analysis secondary structural properties of more than 100s of protein sequences....I have already obtained secondary structure and disordered region analysis text files separately from two different servers...Both the files look as follows...

Secondary structure file (ss.txt)

Index AA SS ASA Phi Psi Theta(i-1=>i+1) Tau(i-2=>i+1) P(C) P(E) P(H)

1 M C 131.5 -93.4 122.8 112.2 -162.9 0.928 0.031 0.047

2 H C 114.1 -96.4 121.2 112.4 -161.9 0.683 0.207 0.080

3 I C 83.0 -86.7 106.4 107.8 -157.8 0.556 0.326 0.092

4 Q H 114.2 -82.3 -18.9 107.9 157.8 0.586 0.180 0.173

5 S H 77.6 -88.9 41.5 108.3 99.6 0.624 0.199 0.139

6 L C 95.5 -81.4 25.1 102.4 -177.8 0.860 0.052 0.071

7 G S 48.5 91.0 -16.2 110.2 -29.4 0.843 0.044 0.113

8 A S 57.7 -83.1 136.6 113.8 91.4 0.800 0.112 0.106

Disorder file (dis.txt)

Index AA Binary Probability

1  M      D    0.97272
2  H       D   0.96426
3  I        O   0.96352
4  Q       O    0.96778
5  S       O    0.97184
6  L       D    0.97648
7  G      D    0.97955
8  A      O    0.98359

Giving the priority to disorderedness wanted to write a script....to replace secondary structure elements (H or S or C) with a Disordered element in protein sequence wherever applicable....and finally calculate the percentage of H, S, C and D residues for all the sequences..... I have save the secondary structure dataset files as 1.ss.txt, 2.ss.txt,3.ss.txt etc and disorderd data files as 1.dis.txt, 2.dis.txt, 3.dis.txt respectively...1.ss.txt and 1.dis.txt are for protein no 1...

sequence alignment secondry structural analysis • 1.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 1444 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6