In the attached image[[1]] have the metabolic pathway Citrate cycle, which contains 6 functional modules with 20, 39, 5, 34, 10 and 12 genes each, respectively, according to the KEGG database. These functional modules (characterized by the letter M followed by a id number) are sets of genes (KO groups) that can be used as a marker for the phenotype for metabolic capacity.
In metagenomic research it is very difficult to identify all existing genes in the sample because of many problems such as: sampling, assembly, prediction, etc. Therefore, to determine if a pathway is present or not in a given sample is necessary to identify those Modules that serve as signature of pathways.
In the image[[1]], for example, in red the KO groups that I identified in my sample using the KAAS tool.
My question is, from the identification of only one of these modules can ensure that the pathway is present in the sample, or we have to consider a number, in this case for example, 2 or 3 or even all?
The definition of this issue is important to define a statistical strategy for the process of identification of metabolic pathways in metagenomic samples.
How can I have evidence that a metabolic pathway is even present in the sample taking into account the KEGG database?