Question: Scripting solution to generate a list of KEGG ORTHOLOGY (KO) terms from a tab-delimited annotation file
0
gravatar for jvire1
9 weeks ago by
jvire110
jvire110 wrote:

Does anyone happen to know a basic scripting (perhaps awk or python) approach to extracting KEGG orthology terms from a tab delimited annotation file?

The file in question has rows that look look like this:

TRINITY_DN18877_c0_g1_i1    KEGG:zma:103654828`KEGG:zma:103654829`KEGG:zma:542341`KO:K02995
TRINITY_DN6301_c0_g1_i1     KEGG:zma:103647201`KO:K10798
TRINITY_DN12892_c3_g5_i1    KEGG:zma:103643875
TRINITY_DN13158_c1_g2_i35   KEGG:vvi:100249085`KO:K02435

What I'm ultimately needing is to extract the transcript ID in column one and the ko terms in column two. Like this:

TRINITY_DN6301_c0_g1_i1     K10798

The end goal is to use the list with KEGG Mapper (http://www.kegg.jp/kegg/tool/map_pathway.html) to see what KEGG pathways are present and most abundant in my transcriptome assembly.

rna-seq • 197 views
ADD COMMENTlink modified 9 weeks ago by Sparrow_kop170 • written 9 weeks ago by jvire110
2
gravatar for Pierre Lindenbaum
9 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum100k wrote:
awk '{n=split($2,a,/`/);for(i=1;i<=n;++i) if(substr(a[i],1,3)=="KO:") printf("%s %s\n",$1,substr(a[i],4));}' input.txt
TRINITY_DN18877_c0_g1_i1 K02995
TRINITY_DN6301_c0_g1_i1 K10798
TRINITY_DN13158_c1_g2_i35 K02435
ADD COMMENTlink written 9 weeks ago by Pierre Lindenbaum100k

Thank you! Worked like a charm.

-James

ADD REPLYlink written 9 weeks ago by jvire110
0
gravatar for Sparrow_kop
9 weeks ago by
Sparrow_kop170
China
Sparrow_kop170 wrote:

In python, I assume the delimiter is tab

with open('your_file','r') as f:
    for line in f:
        if 'KO:' in line:
            line = line.strip().split('\t')
            print(line[0] + '\t' + line[1].split(':')[-1])
ADD COMMENTlink written 9 weeks ago by Sparrow_kop170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 908 users visited in the last hour