Hi,

I have recently started working on a substance's effect on a cell line in different dosages. for this, there is a tool called bmdexpress2 that I am using. its input is the normalized counts from RNASeq for each dosage as a big matrix. when it comes to pathway analysis step, unlike hallmarks of GSEA, this tool uses some databases for defining pathways which involve pathways of even 2-3 genes.

So what I want to discuss here is the strength of small pathways and filtering thresholds. How can we bioinformatically decide on the importance of a pathway made of 3 genes? Should we just filter them out as there are much more comprehensive pathways? Or their GO Levels matter? Also there are many cases that out of those 3 genes, 1 of them is differentially expressed; is having 1 gene diff. expressed out of X genes in a pathway is enough to say that the pathway is enriched?

Here is an example go term for a small pathway.

Thanks

There should be a p-value associated with the enrichments - do you have those? Generally, one should not make concrete decisions or conclusions based on gene enrichment analysis. The chance for false-positive associations is very high, from my perspective,

There is a p-value, based on fisher exact two tailed test, and even after filtration there is a good number of small pathways coming out as significant. I put the same question to stack and was suggested to do multiple comparisons yet I do not know how to do it on such data as it is the tool doing all the work inside and giving me an output that is only made of results and scores.

To do multiple comparisons was also my suggestion. From experience, I notice that the Fisher's Exact p-value can give highly statistically significant associations whilst adjusted p-values can be far from statistically significant. For performing p-value adjustments on Fisher's p-values, that's just not my area and would likely get a better response on Stack itself.

Just from experience, though, be wary of using the Fisher's p-value. Small pathways have smaller numbers of members, obviously, and the chance of making a statistically significant association is therefore higher for these. I don't believe Fisher's makes any intelligent adjustment based on the size of the pathways.