I am trying to do a pathway analysis (or GO ontology enrichment analysis or GSEA etc) for RNAseq data using R. I found that the KEGG database recently updated the pathway list and include the COVID-19 pathway. However, I do not want to get any results related to COVID-19 because obviously my experiment and project have very little if not no relationship with COVID.
Therefore, I am looking for a valid way to exclude this pathway in the analysis. It would be nice if you can suggest R codes to exclude it for R packages such as fgsea and clusterProfiler that is commonly used for pathway analysis.
Furthermore, I am interested to know whether the number of pathway candidates will affect the analysis result. In my example, if I exclude the COVID-19 pathway, there will be an n-1 pathway in the query database (where n is the total number of pathways available in the database). I would like to know whether it will affect the p-value or FDR value of the pathway analysis.
More generally, I wonder is it valid to exclude certain pathways or comprising a customized list of pathways during the pathway analysis. More particularly, I am interested in metabolism-related pathways, while obviously many disease-related curated pathways in KEGG are irrelevant (e.g. Parkinson's disease). I wonder my I pull out only the metabolism-related pathways during the pathway analysis? and how to do so in R. Otherwise, if it is valid to filter a specific group of pathways after the pathway analysis and visualizing it? I just found that it is very annoying to have many irrelevant pathways showed up at top of the list.