I have just been assigned to a project that is in the Bioinformatics field which is novel for me and it involves the following:
There is a list of genes. These genes have been associated with pathways derived from the KEGG database. I also have the KEGG genes and their associated pathways. I have to calculate the significant pathways that are present in my dataset. For that, I have to do a hypergeometric test. After that, I have to select the pathways that have p-values less than 0.005.
What is the meaning of choosing the pathways within this cut-off? When I know that the genes in my dataset belong to certain pathways already, why do I need to do a hypergeometric test? Why would it not be enough to just detect the pathways present in my dataset by finding the intersection between my gene set and that of KEGG's?