Question

KASS KEGG annotation

1

Entering edit mode

10.0 years ago

h.botond ▴ 50

Hello everybody!

I annotated may protein set with Kaas-Kegg Automatic Annotation server. After the process a have get two result files, a html and a text file.

Can somebody tell me how can I get out the annotations from the html file? In the and I want to make two flat file. A Kegg Orthology and a Brite Hierarchy file. Is the an easy way to do this.

Thank for all helps!

annotation kegg kaas • 4.7k views

ADD COMMENT • link updated 3.6 years ago by Ram 44k • written 10.0 years ago by h.botond ▴ 50

0

Entering edit mode

I want to annotate my protein set. For that reason I want to download not only the KO numbers but the annotation to. As I observed it is possible to open all the submenus and copy all the annotations into a txt file but after this I have to reformat the full documernt which is a little awkward. Is there an easier way to do these? To get a table file with my genes and the annotating and the K number.

ADD REPLY • link updated 3.6 years ago by Ram 44k • written 10.0 years ago by h.botond ▴ 50

0

Entering edit mode

you are supposed to comment in this box instead of answer box. Check my edit if it answers your question.

ADD REPLY • link 10.0 years ago by Prakki Rama ★ 2.7k

Ram · Answer 1 · 2014-10-30

Click the html link and click 'exec'. This should generate all the pathways, corresponding genes in that pathway (collapse all), you input proteins hit to. The information which input protein hit to which gene can be found in the text file that you have.

EDIT:

$ cat query.ko
Transg4.t1      K04539
Transg4.t2      K04539
Transg5.t1      K04982
Transg5.t2      K04982
Transg6.t1      K09596

$ cat collapsed.txt

Pathway Search Result

Sort by the number of hits
Hide all objects
ko01100 Metabolic pathways (14)

ko:K00129 E1.2.1.5; aldehyde dehydrogenase (NAD(P)+) [EC:1.2.1.5]
ko:K00411 UQCRFS1; ubiquinol-cytochrome c reductase iron-sulfur subunit [EC:1.10.2.2]
ko:K00710 GALNT; polypeptide N-acetylgalactosaminyltransferase [EC:2.4.1.41]
ko:K01106 E3.1.3.56; inositol-1,4,5-trisphosphate 5-phosphatase [EC:3.1.3.56]
ko:K01132 GALNS; N-acetylgalactosamine-6-sulfatase [EC:3.1.6.4]
ko:K01597 MVD; diphosphomevalonate decarboxylase [EC:4.1.1.33]
ko:K01711 gmd; GDPmannose 4,6-dehydratase [EC:4.2.1.47]
ko:K01772 hemH; ferrochelatase [EC:4.99.1.1]
ko:K02263 COX4; cytochrome c oxidase subunit 4
ko:K04710 CERS; ceramide synthetase [EC:2.3.1.24]
ko:K07419 CYP2R1; vitamin D 25-hydroxylase [EC:1.14.13.159]
ko:K07820 B3GALT2; beta-1,3-galactosyltransferase 2 [EC:2.4.1.-]
ko:K08074 ADPGK; ADP-dependent glucokinase [EC:2.7.1.147]
ko:K13499 CHSY; chondroitin sulfate synthase [EC:2.4.1.175 2.4.1.226]

Using Perl:

open COLLAPSED,"collapsed.txt";

while(<COLLAPSED>)
{
    if($_=~/ko\:(K.+)\s\w+\;\s*(.+\s*\[*.*\]*)\s*/)
    {
    #print "$1,$2";
    $KHash{$1}=$2;
    }
}

open FH,"query.ko";

while(<FH>)
{
    #print $_;    
    if($_=~/(.+)\s+(.+)/ && exists($KHash{$2}))
    {
    print "$1\t$2\t$KHash{$2}";
    }
}

close(COLLAPSED);
close(FH);

Result

$ perl annotatating_Transcripts_UsingKEGG_KAAS.pl
Transg4.t1         K04539    guanine nucleotide-binding protein subunit beta-5
Transg4.t2         K04539    guanine nucleotide-binding protein subunit beta-5
Transg5.t1         K04982    transient receptor potential cation channel subfamily M member 7 [EC:2.7.11.1]
Transg5.t2         K04982    transient receptor potential cation channel subfamily M member 7 [EC:2.7.11.1]

Note: Some transcripts even though have Kegg ID sometimes are not found in the collapsed file.