How to compute the "completeness" of a KEGG module using a set of KEGG orthologs? (Module Completion Ratio)
1
0
Entering edit mode
3.1 years ago
O.rka ▴ 720

This is a problem I have been trying to figure out for years and haven't been able to get any attention on any forums or anything. It's extremely important in metagenomics to be able to assess how complete a particular metabolic module is with respect to a genome. With the implementation of KOFAMSCAN they made it real easy to know what KEGG orthologs are associated with what ORF/GENE but knowing how complete a module is with respect to a genome is really confusing and there is no clear way to do this.

Some of these calculations but some are not very straight forward with their "definition" nomenclature. For example, M00357 is fairly complex. The definition is: ((K00925 K00625),K01895) (K00193+K00197+K00194) (K00577+K00578+K00579+K00580+K00581-K00582-K00583+K00584) (K00399+K00401+K00402) (K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))

According to the "Help" for KEGG modules, the definition is described by the following:

The definition of the module as a list of K numbers for pathway/signature modules and RC numbers for reaction modules. Comma separated K numbers or RC numbers indicate alternatives. Plus signs are used to represent a complex or a combination and a minus sign denotes a non-essential component in the complex.

Main points from the description:

  • Comma separated K numbers or RC numbers indicate alternatives.
  • Plus signs are used to represent a complex or a combination
  • A minus sign denotes a non-essential component in the complex.

MAPLE was the only tool that I know could do this but MAPLE service was discontinued at the end of February, 2019..

I've asked a similar question in the past but got no responses what so ever.

Other people have asked similar questions: Complete or incomplete KEGGs pathways??

I'm trying to do this in high-throughput so trying to make a function in Python:

def module_completion_ratio(definition:str, orthology_set:set):
   # So much empty...
    mcr = None
    return mcr

Going back to the complex example above for M00357. Let's break it up chunk by chunk:

  • ((K00925 K00625),K01895)

    • Is this saying either (K00925 AND K00925) or just K01895 alone?
  • (K00193+K00197+K00194)

    • This is straightforward, all of these are essential
  • (K00577+K00578+K00579+K00580+K00581-K00582-K00583+K00584)

    • Is this saying all are essential but K00582 and K00583?
  • (K00399+K00401+K00402)

    • Straightforward again...
  • (K22480+K22481+K22482,K03388+K03389+K03390,K08264+K08265,K03388+K03389+K03390+K14127+(K14126+K14128,K22516+K00125))

    • For this beast here, is it saying either (K22480+K22481+K22482) OR (K03388+K03389+K03390) OR (K08264+K08265) OR K03388+K03389+K03390+K14127 AND EITHER (K14126+K14128) OR (K22516+K00125)

Are there any scripts in R or Python that have already coded this up? The logic is quite confusing and automating this is going to be quite frustrating. If you don't know of anything can you help me understand if my logic above makes sense and if not then why?

kegg database metagenomics module • 1.5k views
ADD COMMENT
0
Entering edit mode
3.1 years ago
Mensur Dlakic ★ 27k

It was complicated to install, but it has at least some functionality you asked about:

Summarize the metabolic potential using KEGG modules by extracting KO numbers associated with each match in the databases used. The summary output is a matrix with module completion and two plots showing module completeness per genome (see below).

ADD COMMENT
0
Entering edit mode

Oh this is good. It looks a bit confusing to use in my context but I might be able to dismantle this script https://github.com/cruizperez/MicrobeAnnotator/blob/master/independent_scripts/ko_mapper.py

I didn’t realize different module types are parsed differently . Hmm...this is probably the most useful bit of code so far for what I’m trying to do. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2336 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6