Question: pathway mapping using KEGG
gravatar for mwanerhi  erfgtr
5.0 years ago by
United States
mwanerhi erfgtr30 wrote:

I have assigned KEGG ids for my newly sequenced protein sequences, using Using Kegg/Kaas, sow i have a list of IDs , how do i assign them pathway maps . i need to know which of the genes(proteins) is in what family

sequencing • 2.7k views
ADD COMMENTlink modified 3.6 years ago by Santiago Montero-Mendieta130 • written 5.0 years ago by mwanerhi erfgtr30
gravatar for Kamil
5.0 years ago by
Kamil2.0k wrote:

Could I ask you to provide an example of an input file and an example of your desired output? It might help us to better understand your question.

Perhaps you might find this tool useful?

ADD COMMENTlink written 5.0 years ago by Kamil2.0k

input is a file of protein sequences >5000

eg >mgg4500002 qor, 1144-2184 (Clockwise) Quinone oxidoreductase
>mgg4500003 BASYS00003, 2160-2531 (Clockwise) Hypothetical Protein BASYS00003
>mgg4500004 insK, 3371-2562 (CounterClockwise) Putative transposase InsK for insertion sequence element IS150

output should look like this:

Amino acid metabolism

MAP00250 : Alanine, aspartate and glutamate metabolism

MAP00260 : Glycine, serine and threonine metabolism

MAP00270 : Cysteine and methionine metabolism

MAP00280 : Valine, leucine and isoleucine degradation

MAP00290 : Valine, leucine and isoleucine biosynthesis

MAP00300 : Lysine biosynthesis

MAP00310 : Lysine degradation

MAP00330 : Arginine and proline metabolism

MAP00340 : Histidine metabolism

MAP00350 : Tyrosine metabolism

MAP00360 : Phenylalanine metabolism

MAP00380 : Tryptophan metabolism

MAP00400 : Phenylalanine, tyrosine and tryptophan biosynthesis


Biosynthesis of other secondary metabolites

MAP00232 : Caffeine metabolism

MAP00311 : Penicillin and cephalosporin biosynthesis

MAP00401 : Novobiocin biosynthesis

MAP00402 : Benzoxazinoid biosynthesis

MAP00521 : Streptomycin biosynthesis

MAP00524 : Butirosin and neomycin biosynthesis

MAP00940 : Phenylpropanoid biosynthesis

MAP00950 : Isoquinoline alkaloid biosynthesis

MAP00960 : Tropane, piperidine and pyridine alkaloid biosynthesis

MAP00966 : Glucosinolate biosynthesis


All proteins mapped

ADD REPLYlink written 5.0 years ago by mwanerhi erfgtr30
gravatar for Santiago Montero-Mendieta
3.6 years ago by

I solved this by using GhostKOALA.

Just need to provide your query amino acid sequences in FASTA format and speficy which KEGG GENES database file to be searched. You will get an email when your results are ready. On the results, if you go to "reconstruct pathway" it will tell you how many proteins match to each family and also which of the genes is in each family. Hope it helps!

ADD COMMENTlink written 3.6 years ago by Santiago Montero-Mendieta130

How long does it take usually for GhostKOALA to run a ~5mb AA fasta file? Cheers

ADD REPLYlink written 3.5 years ago by h.l.wong60

I would say probably less than 1 hour. I tried with a 15MB AA fasta file and took about 3 hours.

ADD REPLYlink written 3.5 years ago by Santiago Montero-Mendieta130

Thanks, I uploaded a 1.3mb AA fasta file and it took 22 hours. I guess the server is busy at the moment?



ADD REPLYlink written 3.5 years ago by h.l.wong60

Do the FASTA-formatted amino acid sequences have to be divided into proteins, like this:

>PROKKA_00002 hypothetical protein
>PROKKA_00003 ATP-dependent RNA helicase RhlE
>PROKKA_00004 Long-chain-fatty-acid--CoA ligase FadD13

I mean, they have to be, right? Otherwise, how would the program tell where one protein starts and the next one begins.

ADD REPLYlink written 3.5 years ago by willnotburn40

Welcome @willnotburn : As far as I am aware, partial proteins sequences can be used as input too. This means that you can input sequences that either do not start with M (5prime_partial) or do not end with * (3prime_partial). I did not have any problem with internal protein sequences either.

ADD REPLYlink written 3.5 years ago by Santiago Montero-Mendieta130

Thanks, Santiago! Partial protein sequence support definitely helps. But just so I get it clearly: each (full or partial) sequence has to have its own FASTA header >, followed by the sequence on the next line. Is that right?

ADD REPLYlink written 3.5 years ago by willnotburn40

Yep, it's just a regular FASTA formatted file :-)

ADD REPLYlink written 3.5 years ago by Santiago Montero-Mendieta130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1188 users visited in the last hour