A heads-up for anyone working with high-dimensional omics data -- FDR correction can sometimes lie to you. A summary based on our recent publication: https://doi.org/10.1186/s13059-025-03734-z
The widely used False Discovery Rate (FDR) control method, Benjamini-Hochberg (BH), is a staple in omics research. But when analysing datasets with dependencies between features (like gene expression, methylation, metabolites, QTL analyses ++), it can behave unexpectedly.
Even when a study has no true biological signal (all null hypotheses are true), the BH method can occasionally generate thousands of statistically "significant" findings. This happens because dependencies in the data can cause many features to falsely appear significant together. While the overall FDR is controlled (e.g., <5% of experiments have errors), the experiments that do have errors can have thousands of them.
A Counter-Intuitive Trap: Using real-world and simulated data (methylation, gene expression, metabolite and eQTL analyses), we found this phenomenon to be persistent. The primary danger is that researchers may be misled by the sheer volume of these false findings. It feels intuitive to believe that if hundreds or thousands of features are flagged as significant, at least some of them must be real. However, we show this intuition can be wrong; it's possible that every single finding is false.
Risk of increased number of False Discoveries: This statistical artefact can lead researchers to incorrectly conclude the existence of an underlying biological mechanism, which might even form the main conclusion of their study. Issues like broken test assumptions, study biases, or the researcher’s flexibility in analyzing the data can make this problem even worse.
So, what can you do? We suggest a few key strategies: Use negative controls/synthetic null data and other diagnostic checks as recommended in the article to identify and minimize caveats. If continuing to use BH method -- try to know its assumptions and formal guarantees to ensure correct interpretation of the findings. As a safer alternative, consider the Benjamini-Yekutieli (BY) method when you can tolerate a bit more type II error. It doesn't completely eliminate the issue but makes these large false positive events much less frequent and severe. It's a good compromise between the popular BH method and overly conservative FWER corrections.
The bottom line: be aware of dependencies in your data! When false findings occur in highly correlated datasets, they can be numerous. Don't let your intuition fool you. Read the full open-access paper here: https://doi.org/10.1186/s13059-025-03734-z
Thanks to all collaborators & co-authors for useful inputs, brainstorming and perspectives: Maria Mamica, Emilie Willoch Olstad, Ingrid Hobæk Haff, Manuela Zucknick, Jingyi Jessica Li, & Geir Kjetil Sandve.