Extract Data From Dssp Output
1
0
Entering edit mode
7.2 years ago
edwee • 0

I have some dssp files in a folder f1. I want to extract the lines from these files only if the values in the phi and psi column between -67<=phi<=-99 and 100<=psi<=165 I would like to save the outputs in to another folder f2 with the input file names. I think, it's very difficult to extract with awk. I highly appreciate your valuable suggestions.

    #  RESIDUE AA STRUCTURE BP1 BP2  ACC     N-H-->O    O-->H-N    N-H-->O    O-->H-N    TCO  KAPPA ALPHA  PHI   PSI    X-CA   Y-CA   Z-CA 
    1   98 A E              0   0  236      0, 0.0     2,-0.2     0, 0.0    21,-0.0   0.000 360.0 360.0 360.0 145.2   53.5   26.9    4.7
    2   99 A I        -     0   0   96     21,-0.0     2,-0.4    19,-0.0    21,-0.1  -0.518 360.0-159.2 -82.9 141.2   50.2   25.1    4.2
    3  100 A I        -     0   0   34     -2,-0.2    19,-2.4   154,-0.2     2,-0.6  -0.930   8.8-148.0-115.9 140.0   47.5   24.6    6.9
    4  101 A Q  E     -A   21   0A  60     -2,-0.4     2,-0.4    17,-0.2    17,-0.2  -0.948  11.6-149.4-114.1 115.7   44.9   21.9    6.5
    5  102 A I  E     -A   20   0A   0     15,-3.1    15,-2.8    -2,-0.6    50,-0.2  -0.704  20.9-116.4 -87.8 128.1   41.5   22.7    8.1
    6  103 A T        -     0   0   17     48,-3.0    13,-0.1    -2,-0.4    40,-0.1  -0.322  11.7-154.3 -64.4 140.8   39.4   19.9    9.4
    7  104 A T        -     0   0    1      2,-0.4   200,-0.3    39,-0.1    -1,-0.1   0.529  48.6-105.8 -88.9 -10.1   36.1   19.2    7.7
    8  105 A G  S    S+     0   0    9      1,-0.2     2,-0.5   198,-0.1   201,-0.3   0.196  97.2  94.5 101.9 -14.8   34.7   17.6   10.9
    9  106 A S     >  -     0   0    0    199,-0.2     4,-2.6     1,-0.1    -2,-0.4  -0.951  61.4-156.7-114.9 124.2   35.0   14.1    9.4
   10  107 A K  H  > S+     0   0  153     -2,-0.5     4,-2.2     1,-0.2    -1,-0.1   0.904  98.4  47.9 -62.8 -42.3   38.1   11.9   10.0
   11  108 A E  H  > S+     0   0   90      2,-0.2     4,-2.0     1,-0.2    -1,-0.2   0.895 113.1  46.7 -66.4 -41.5   37.5   10.0    6.8
   12  109 A L  H  > S+     0   0    1      2,-0.2     4,-1.8     1,-0.2    -2,-0.2   0.902 111.5  52.5 -68.1 -39.3   36.9   13.1    4.7
   13  110 A D  H  <>S+     0   0   23     -4,-2.6     5,-3.4     1,-0.2     6,-0.3   0.913 108.5  50.8 -61.1 -42.8   40.1   14.6    6.2
   14  111 A K  H ><5S+     0   0  140     -4,-2.2     3,-1.8     1,-0.2    -1,-0.2   0.915 108.2  51.4 -61.3 -43.2   42.1   11.5    5.2
   15  112 A L  H 3<5S+     0   0   31     -4,-2.0    -1,-0.2     1,-0.3    -2,-0.2   0.853 111.2  49.1 -61.2 -34.4   40.8   11.7    1.7
   16  113 A L  T ><5S-     0   0   15     -4,-1.8     3,-1.1    -5,-0.1    -1,-0.3   0.210 118.2-113.9 -89.9  13.4   41.9   15.3    1.6
   17  114 A Q  T < 5S-     0   0  157     -3,-1.8    -3,-0.2     1,-0.3    -2,-0.1   0.811  92.9 -18.2  59.0  32.6   45.3   14.4    3.0
   18  115 A G  T 3 <S-     0   0   37     -5,-3.4     2,-0.3    -6,-0.1    -1,-0.3  -0.081 117.1 -71.2 132.9 -36.9   44.6   16.3    6.2
   19  116 A G  S <  S-     0   0    3     -3,-1.1     2,-0.4    -6,-0.3   -13,-0.2  -0.957  84.5 -10.3 147.2-164.0   41.7   18.6    5.4
   20  117 A I  E    S-A    5   0A   1    -15,-2.8   -15,-3.1    -2,-0.3     2,-0.4  -0.545  71.0-128.8 -72.0 125.3   40.9   21.7    3.4
   21  118 A E  E >   -A    4   0A  59     -2,-0.4     3,-0.9   -17,-0.2   139,-0.4  -0.594   8.8-138.2 -83.6 127.4   44.1   23.3    2.2
   22  119 A T  T 3  S+     0   0    8    -19,-2.4   137,-0.3    -2,-0.4   136,-0.2  -0.496  87.2  38.6 -76.2 150.6   45.0   27.0    2.7
   23  120 A G  T 3  S+     0   0   26    135,-1.8     2,-0.3     1,-0.2    -1,-0.2   0.450 110.5  59.5  91.1  -0.5   46.5   28.7   -0.3
   24  121 A S  S <  S-     0   0    2     -3,-0.9   136,-2.3   134,-0.2     2,-0.4  -0.919  83.5 -95.1-149.5 174.3   44.4   27.0   -3.0
   25  122 A I  E     -b  160   0B   9     -2,-0.3   146,-2.4   134,-0.2     2,-0.5  -0.795  21.0-162.6-100.0 136.0   40.9   26.5   -4.2
   26  123 A T  E     -bc 161 171B   2    134,-2.8   136,-2.5    -2,-0.4     2,-0.6  -0.988  13.5-161.8-114.8 118.5   38.7   23.5   -3.2
   27  124 A E  E     -bc 162 172B  13    144,-3.0   146,-2.9    -2,-0.5     2,-0.5  -0.920   2.6-161.9-108.0 120.3   35.8   23.3   -5.6
   28  125 A X  E     -bc 163 173B   0    134,-3.0   136,-2.5    -2,-0.6     2,-0.4  -0.889   2.6-154.7-108.5 125.8   32.8   21.1   -4.5
   29  126 A F  E     + c   0 174B  42    144,-3.3   146,-2.6    -2,-0.5     2,-0.2  -0.814  44.0  83.6-100.8 135.6   30.2   19.8   -6.9
   30  127 A G        -     0   0   21    134,-1.8   134,-0.1    -2,-0.4     3,-0.1  -0.636  67.7 -89.6 149.1 153.7   26.7   19.0   -5.9
   31  128 A E    >   -     0   0  150     -2,-0.2     3,-2.2   144,-0.2     5,-0.4  -0.318  64.3 -67.9 -81.1 168.1   23.2   20.4   -5.2
   32  129 A F  T 3  S+     0   0   85      1,-0.3    -1,-0.2     2,-0.1    -2,-0.0  -0.304 124.8  30.0 -55.8 139.2   22.0   21.8   -1.9
   33  130 A R  T 3  S+     0   0  178     -3,-0.1    -1,-0.3     1,-0.0   169,-0.1   0.201  88.2  98.8  92.3 -11.9   21.7   18.9    0.6
   34  131 A T  S <  S-     0   0   22     -3,-2.2   142,-0.2   141,-0.1   150,-0.1   0.609 110.4 -96.8 -77.2 -10.7   24.5   16.8   -1.0
protein • 2.4k views
ADD COMMENT
0
Entering edit mode
7.2 years ago
PoGibas 4.9k

I suggest not using same filenames (add specific extension (e.g., _extracted)).
I would use bash to loop over files & awk to extract info (check this awk multiple condition example).

ADD COMMENT

Login before adding your answer.

Traffic: 1491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6