Question

Bonferroni correction for burden by transcripts

0

Entering edit mode

10 months ago

raphael.B ▴ 540

Hello,

I am doing rare variant association tests aggregating varaint effects on ensembl transcripts. Traditional multiple test correction is based on the assumption that tests are independent but this is not the case here since transcripts overlap. (Wrongly) assuming independance gives me a tight threshold which I used so far, but does not feel right. Neither does using a threshold computed from the number of genes which would not account for the differences between 2 transcripts of a same gene.

Thanks in advance for your suggestions,

Raphaël

Association-test statistics • 887 views

ADD COMMENT • link updated 10 months ago by Gordon Smyth ★ 8.4k • written 10 months ago by raphael.B ▴ 540

score 2 · Accepted Answer · 2024-12-19

Why would you think that Bonferroni assumes independence? Well known methods like Bonferroni and Holm provide strong control of the family-wise error rate (FWER) under any dependence structure.

If you are interested in false discovery rate control instead of FWER, the Benjamini and Hochberg (BH) method has been formally proved to control the false discovery rate (FDR) only under a limited range of dependence structures but, in practice, it proves to be extremely robust to dependence between tests. I have shown myself that BH approximates the posterior probability of a false discovery using a ranked p-value argument that doesn't assume independence, so the robustness to dependence can be understand as flowing from that. Indeed, I do not know of anyone being able to demonstrate that BH fails to control the FDR correctly under a dependence structure that might realistically occur in high-throughout genomic data. However, if you weren't satisfied with that assurance, you could use Benjamini-Yekutieli, which is more conservative than BH but which has been formally proved under general dependence. See the help page for the p.adjust function in R.

The edgeR package uses the BH algorithm to control the FDR when doing differential expression analysis of Ensembl transcripts, and has been shown to control the FDR correctly in extensive simulations (Baldoni et al 2024ab). Overlap between transcripts from the same gene does cause overdispersion issues, which is the central focus of those papers, but it doesn't cause problems with FDR control.

References

Baldoni PL#, Chen Y#, Hediyeh-zadeh S, Liao Y, Dong X, Ritchie ME, Shi W, Smyth GK (2024a). Dividing out quantification uncertainty allows efficient assessment of differential transcript expression with edgeR. Nucleic Acids Research 52(3), e13.

Baldoni PL, Chen L, Smyth GK (2024b). Faster and more accurate assessment of differential transcript expression with Gibbs sampling and edgeR v4. NAR Genomics and Bioinformatics 6(4), lqae151.