Parsing the codeml output in nice format?
0
0
Entering edit mode
21 months ago
sunnykevin97 ▴ 980

Hi,

I had a text file, consisting of codeml output.

I'd like to rearrange the Original text file into a nice format (below) when their is \n. Suggestions.

Original file -

     \n
    OG0022515_M0
    lnL = -334.227266
    lnL = -334.227266
    lnL = -321.454615
    lnL = -325.325316
    ....
    ....
    ....
    ....
    Model: One dN/dS ratio for branches,
     Nei & Gojobori 1986. dN/dS (dN, dS)
     t= 0.0228  S=    33.6  N=   245.4  dN/dS=  0.1244  dN = 0.0041  dS = 0.0331
     t= 0.0228  S=    33.6  N=   245.4  dN/dS=  0.1244  dN = 0.0041  dS = 0.0331
     t= 0.0000  S=    31.7  N=   247.3  dN/dS=  0.0010  dN = 0.0000  dS = 0.0000
     t= 0.0000  S=    31.1  N=   247.9  dN/dS=  0.0010  dN = 0.0000  dS = 0.0000
     t= 0.0228  S=    33.6  N=   245.4  dN/dS=  0.1244  dN = 0.0041  dS = 0.0331
     ......
     ......
     ......
     t= 0.0000  S=    27.4  N=   251.6  dN/dS=  2.0000  dN = 0.0000  dS = 0.0000
     t= 0.0119  S=    33.2  N=   245.8  dN/dS=  0.0010  dN = 0.0000  dS = 0.0331
     t= 0.0109  S=    31.4  N=   247.6  dN/dS= 99.0000  dN = 0.0041  dS = 0.0000
     t= 0.0109  S=    31.4  N=   247.6  dN/dS= 99.0000  dN = 0.0041  dS = 0.0000
     t= 0.0119  S=    33.2  N=   245.8  dN/dS=  0.0010  dN = 0.0000  dS = 0.0331



     \n

    OG0022508_M0
    lnL = -389.618240
    lnL = -403.414349
    lnL = -396.165769
    lnL = -430.934701
    lnL = -428.190811
    lnL = -438.314006
    lnL = -430.934701
    lnL = -428.190811
    .....
    .....
    .....
    Model: One dN/dS ratio for branches,
     Nei & Gojobori 1986. dN/dS (dN, dS)
     t= 0.0216  S=    55.1  N=   247.9  dN/dS=  0.0010  dN = 0.0000  dS = 0.0394
     t= 0.0429  S=    54.9  N=   248.1  dN/dS=  0.0689  dN = 0.0041  dS = 0.0601
     t= 0.0207  S=    56.8  N=   246.2  dN/dS=  0.2171  dN = 0.0041  dS = 0.0189
     t= 0.1162  S=    64.3  N=   238.7  dN/dS=  0.0257  dN = 0.0043  dS = 0.1665
     t= 0.1022  S=    64.3  N=   238.7  dN/dS=  0.0296  dN = 0.0043  dS = 0.1445
     ......
     ......
     ......
     ......

     t= 0.1253  S=    62.8  N=   240.2  dN/dS=  0.0505  dN = 0.0085  dS = 0.1688
     t= 0.1162  S=    64.3  N=   238.7  dN/dS=  0.0257  dN = 0.0043  dS = 0.1665
     t= 0.1022  S=    64.3  N=   238.7  dN/dS=  0.0296  dN = 0.0043  dS = 0.1445
     t= 0.1253  S=    62.8  N=   240.2  dN/dS=  0.0505  dN = 0.0085  dS = 0.1688
     t= 0.0000  S=    70.0  N=   233.0  dN/dS=  0.0010  dN = 0.0000  dS = 0.0000
     t= 0.1162  S=    64.3  N=   238.7  dN/dS=  0.0257  dN = 0.0043  dS = 0.1665
     t= 0.1022  S=    64.3  N=   238.7  dN/dS=  0.0296  dN = 0.0043  dS = 0.1445

Expected output-

I'd like to parse the text file in to a under-stable format

    ID                       InL                             t                   S              N                 dN/dS                 dN                 dS  
    OG0022515_M0 lnL = -334.227266    t= 0.0228  S= 33.6  N=   245.4  dN/dS=  0.1244  dN = 0.0041  dS = 0.0331
    OG0022515_M0 lnL = -334.227266   t= 0.0228  S= 33.6   N=   245.4  dN/dS=  0.1244  dN = 0.0041  dS = 0.0331
    OG0022515_M0 lnL = -334.227266   t= 0.0000  S= 31.7   N=   247.3  dN/dS=  0.0010  dN = 0.0000  dS = 0.0000
    OG0022515_M0 lnL = -321.454615   t= 0.0000  S= 31.1   N=   247.9  dN/dS=  0.0010  dN = 0.0000  dS = 0.0000
    OG0022515_M0 lnL = -325.325316  t= 0.0228  S= 33.6    N=   245.4  dN/dS=  0.1244  dN = 0.0041  dS = 0.0331
    ..........
    ..........
    ...........
gene genome protein • 698 views
ADD COMMENT
2
Entering edit mode

I tried this

   import pandas as pd

lines = []
with open("/home/sun/Documents/genesite/Selection_Analysis/M0_res.txt", mode='r') as f:
    for line in f:
        lines.append(line.rstrip())




    res = []
    for i in range(len(lines)):
        if "OG" in lines[i] and "\\n" not in lines[i+1]:
            alignment = lines[i].split("_")[0]
    #        lnl = lines[i+1].split()[4]
    #        omega = lines[i+2].split()[3]
            dN = lines[i+3].split()[4]
            dS = lines[i+4].split()[4]
            res.append([alignment, dN, dS])




---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-527c7275e76d> in <module>
      6 #        omega = lines[i+2].split()[3]
      7         dN = lines[i+3].split()[4]
----> 8         dS = lines[i+4].split()[4]
      9         res.append([alignment, dN, dS])

IndexError: list index out of range
ADD REPLY
0
Entering edit mode

What have you tried?

ADD REPLY

Login before adding your answer.

Traffic: 2330 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6