Question: Bonferroni multiple corrections question
0
jevanveen20 wrote:

Hello Biostars!

I have been analyzing some single cell RNA sequencing data, which compares neural transcriptomes from mice treated with either vehicle, or a drug. All the basic stuff is going fine.

When I get my clusters, I find not many super interesting significant drug induced DEGs. If I do GSEA on each cluster, however, things look very interesting, and align with published literature very well. All good so far.

So my problem is this - I use fgsea in R to do my GSEAs, but I am doing the GSEAs on 18 different clusters. That means that my adjusted p values that I get from fgsea are not valid - they need to be corrected to reflect the 18 repeated tests.

If I were to Bonferroni correct those p values, would it be acceptable to take the fgsea output adjusted p values that were already corrected for the number of gene sets tested, and then correct them again? Or is it more appropriate to take the raw p value from fgsea and somehow Bonferroni it for both the number of gene sets tested and also the number of clusters in which I am performing the tests?

modified 4 weeks ago by dsull420 • written 4 weeks ago by jevanveen20
1

Hi jevanveen,

I was thinking to run GSEA on my scRNA-seq data, but I'm still debating what would be the input from evert single cluster? How do you project the complexity of the cluster? Are you using the average gene expression from each cluster? Are you sampling few cells from each cluster?

I'd really appreciate your thoughts on that. Thanks!

1
dsull420 wrote:

You perform Bonferroni correction by dividing p-values by the total number of tests performed.

If you have 18 clusters and 100 gene sets, divide by 1800 (e.g. your p-value threshold would become 0.05/1800 instead of 0.05).

Yes, you can Bonferroni a Bonferroni-adjusted p-value. Using the example above, if GSEA already multiplies by 100 for you, then all you have to do is multiply by 18, which is mathematically equivalent to multiplying the raw p-values by 1800 (or, equivalently, dividing your p-value threshold by 1800).

Note: If you don't see any cluster being significant in any gene set, that doesn't mean an enrichment doesn't exist. Bonferroni is known for low statistical power (you're likely to get false negatives). The family-wise error rate [FWER] (which Bonferroni controls for) merely gives you the probability of making at least one false positive (e.g. Bonferroni assures that the probability of making at least one false positive is <= 0.05, assuming an alpha of 0.05). Even for controlling the FWER, there are more powerful methods that still control FWER, such as the Holm-Bonferroni.