Question: Extract Data From Dssp Output
0
gravatar for edwee
3.8 years ago by
edwee0
edwee0 wrote:

I have some dssp files in a folder f1. I want to extract the lines from these files only if the values in the phi and psi column between -67<=phi<=-99 and 100<=psi<=165 I would like to save the outputs in to another folder f2 with the input file names. I think, it's very difficult to extract with awk. I highly appreciate your valuable suggestions.

    #  RESIDUE AA STRUCTURE BP1 BP2  ACC     N-H-->O    O-->H-N    N-H-->O    O-->H-N    TCO  KAPPA ALPHA  PHI   PSI    X-CA   Y-CA   Z-CA 
    1   98 A E              0   0  236      0, 0.0     2,-0.2     0, 0.0    21,-0.0   0.000 360.0 360.0 360.0 145.2   53.5   26.9    4.7
    2   99 A I        -     0   0   96     21,-0.0     2,-0.4    19,-0.0    21,-0.1  -0.518 360.0-159.2 -82.9 141.2   50.2   25.1    4.2
    3  100 A I        -     0   0   34     -2,-0.2    19,-2.4   154,-0.2     2,-0.6  -0.930   8.8-148.0-115.9 140.0   47.5   24.6    6.9
    4  101 A Q  E     -A   21   0A  60     -2,-0.4     2,-0.4    17,-0.2    17,-0.2  -0.948  11.6-149.4-114.1 115.7   44.9   21.9    6.5
    5  102 A I  E     -A   20   0A   0     15,-3.1    15,-2.8    -2,-0.6    50,-0.2  -0.704  20.9-116.4 -87.8 128.1   41.5   22.7    8.1
    6  103 A T        -     0   0   17     48,-3.0    13,-0.1    -2,-0.4    40,-0.1  -0.322  11.7-154.3 -64.4 140.8   39.4   19.9    9.4
    7  104 A T        -     0   0    1      2,-0.4   200,-0.3    39,-0.1    -1,-0.1   0.529  48.6-105.8 -88.9 -10.1   36.1   19.2    7.7
    8  105 A G  S    S+     0   0    9      1,-0.2     2,-0.5   198,-0.1   201,-0.3   0.196  97.2  94.5 101.9 -14.8   34.7   17.6   10.9
    9  106 A S     >  -     0   0    0    199,-0.2     4,-2.6     1,-0.1    -2,-0.4  -0.951  61.4-156.7-114.9 124.2   35.0   14.1    9.4
   10  107 A K  H  > S+     0   0  153     -2,-0.5     4,-2.2     1,-0.2    -1,-0.1   0.904  98.4  47.9 -62.8 -42.3   38.1   11.9   10.0
   11  108 A E  H  > S+     0   0   90      2,-0.2     4,-2.0     1,-0.2    -1,-0.2   0.895 113.1  46.7 -66.4 -41.5   37.5   10.0    6.8
   12  109 A L  H  > S+     0   0    1      2,-0.2     4,-1.8     1,-0.2    -2,-0.2   0.902 111.5  52.5 -68.1 -39.3   36.9   13.1    4.7
   13  110 A D  H  <>S+     0   0   23     -4,-2.6     5,-3.4     1,-0.2     6,-0.3   0.913 108.5  50.8 -61.1 -42.8   40.1   14.6    6.2
   14  111 A K  H ><5S+     0   0  140     -4,-2.2     3,-1.8     1,-0.2    -1,-0.2   0.915 108.2  51.4 -61.3 -43.2   42.1   11.5    5.2
   15  112 A L  H 3<5S+     0   0   31     -4,-2.0    -1,-0.2     1,-0.3    -2,-0.2   0.853 111.2  49.1 -61.2 -34.4   40.8   11.7    1.7
   16  113 A L  T ><5S-     0   0   15     -4,-1.8     3,-1.1    -5,-0.1    -1,-0.3   0.210 118.2-113.9 -89.9  13.4   41.9   15.3    1.6
   17  114 A Q  T < 5S-     0   0  157     -3,-1.8    -3,-0.2     1,-0.3    -2,-0.1   0.811  92.9 -18.2  59.0  32.6   45.3   14.4    3.0
   18  115 A G  T 3 <S-     0   0   37     -5,-3.4     2,-0.3    -6,-0.1    -1,-0.3  -0.081 117.1 -71.2 132.9 -36.9   44.6   16.3    6.2
   19  116 A G  S <  S-     0   0    3     -3,-1.1     2,-0.4    -6,-0.3   -13,-0.2  -0.957  84.5 -10.3 147.2-164.0   41.7   18.6    5.4
   20  117 A I  E    S-A    5   0A   1    -15,-2.8   -15,-3.1    -2,-0.3     2,-0.4  -0.545  71.0-128.8 -72.0 125.3   40.9   21.7    3.4
   21  118 A E  E >   -A    4   0A  59     -2,-0.4     3,-0.9   -17,-0.2   139,-0.4  -0.594   8.8-138.2 -83.6 127.4   44.1   23.3    2.2
   22  119 A T  T 3  S+     0   0    8    -19,-2.4   137,-0.3    -2,-0.4   136,-0.2  -0.496  87.2  38.6 -76.2 150.6   45.0   27.0    2.7
   23  120 A G  T 3  S+     0   0   26    135,-1.8     2,-0.3     1,-0.2    -1,-0.2   0.450 110.5  59.5  91.1  -0.5   46.5   28.7   -0.3
   24  121 A S  S <  S-     0   0    2     -3,-0.9   136,-2.3   134,-0.2     2,-0.4  -0.919  83.5 -95.1-149.5 174.3   44.4   27.0   -3.0
   25  122 A I  E     -b  160   0B   9     -2,-0.3   146,-2.4   134,-0.2     2,-0.5  -0.795  21.0-162.6-100.0 136.0   40.9   26.5   -4.2
   26  123 A T  E     -bc 161 171B   2    134,-2.8   136,-2.5    -2,-0.4     2,-0.6  -0.988  13.5-161.8-114.8 118.5   38.7   23.5   -3.2
   27  124 A E  E     -bc 162 172B  13    144,-3.0   146,-2.9    -2,-0.5     2,-0.5  -0.920   2.6-161.9-108.0 120.3   35.8   23.3   -5.6
   28  125 A X  E     -bc 163 173B   0    134,-3.0   136,-2.5    -2,-0.6     2,-0.4  -0.889   2.6-154.7-108.5 125.8   32.8   21.1   -4.5
   29  126 A F  E     + c   0 174B  42    144,-3.3   146,-2.6    -2,-0.5     2,-0.2  -0.814  44.0  83.6-100.8 135.6   30.2   19.8   -6.9
   30  127 A G        -     0   0   21    134,-1.8   134,-0.1    -2,-0.4     3,-0.1  -0.636  67.7 -89.6 149.1 153.7   26.7   19.0   -5.9
   31  128 A E    >   -     0   0  150     -2,-0.2     3,-2.2   144,-0.2     5,-0.4  -0.318  64.3 -67.9 -81.1 168.1   23.2   20.4   -5.2
   32  129 A F  T 3  S+     0   0   85      1,-0.3    -1,-0.2     2,-0.1    -2,-0.0  -0.304 124.8  30.0 -55.8 139.2   22.0   21.8   -1.9
   33  130 A R  T 3  S+     0   0  178     -3,-0.1    -1,-0.3     1,-0.0   169,-0.1   0.201  88.2  98.8  92.3 -11.9   21.7   18.9    0.6
   34  131 A T  S <  S-     0   0   22     -3,-2.2   142,-0.2   141,-0.1   150,-0.1   0.609 110.4 -96.8 -77.2 -10.7   24.5   16.8   -1.0
protein • 1.3k views
ADD COMMENTlink modified 3.8 years ago by PoGibas4.7k • written 3.8 years ago by edwee0
0
gravatar for PoGibas
3.8 years ago by
PoGibas4.7k
Vilnius
PoGibas4.7k wrote:

I suggest not using same filenames (add specific extension (e.g., _extracted)).
I would use bash to loop over files & awk to extract info (check this awk multiple condition example).

ADD COMMENTlink written 3.8 years ago by PoGibas4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1084 users visited in the last hour