Hi Friends... I am just a beginner in Python...I want to analysis secondary structural properties of more than 100s of protein sequences....I have already obtained secondary structure and disordered region analysis text files separately from two different servers...Both the files look as follows...
Secondary structure file (ss.txt)
Index AA SS ASA Phi Psi Theta(i-1=>i+1) Tau(i-2=>i+1) P(C) P(E) P(H)
1 M C 131.5 -93.4 122.8 112.2 -162.9 0.928 0.031 0.047
2 H C 114.1 -96.4 121.2 112.4 -161.9 0.683 0.207 0.080
3 I C 83.0 -86.7 106.4 107.8 -157.8 0.556 0.326 0.092
4 Q H 114.2 -82.3 -18.9 107.9 157.8 0.586 0.180 0.173
5 S H 77.6 -88.9 41.5 108.3 99.6 0.624 0.199 0.139
6 L C 95.5 -81.4 25.1 102.4 -177.8 0.860 0.052 0.071
7 G S 48.5 91.0 -16.2 110.2 -29.4 0.843 0.044 0.113
8 A S 57.7 -83.1 136.6 113.8 91.4 0.800 0.112 0.106
Disorder file (dis.txt)
Index AA Binary Probability
1 M D 0.97272
2 H D 0.96426
3 I O 0.96352
4 Q O 0.96778
5 S O 0.97184
6 L D 0.97648
7 G D 0.97955
8 A O 0.98359
Giving the priority to disorderedness wanted to write a script....to replace secondary structure elements (H or S or C) with a Disordered element in protein sequence wherever applicable....and finally calculate the percentage of H, S, C and D residues for all the sequences..... I have save the secondary structure dataset files as 1.ss.txt, 2.ss.txt,3.ss.txt etc and disorderd data files as 1.dis.txt, 2.dis.txt, 3.dis.txt respectively...1.ss.txt and 1.dis.txt are for protein no 1...