Bonferonni Correction for mostly overlapping enhancers
7.9 years ago
Vincent Laufer ★ 2.5k

I am running one burden test per enhancer region in about 14,000 enhancers.

So, I could correct by taking a threshold of 0.05 / 14000.

However, some of these enhancers are actually exactly the same, but being tested twice due to being active in more than one cell type. Others are distinct enhancers, but the sequence is anywhere from 5-95% overlapping another enhancer.

My question is, how should I go about establishing a bonferonni-corrected threshhold for association?

Clearly this are not independent tests, which is normally when a BCT is invoked. So what is the best fix here?

I want to run all the tests because I want to know if the same, or similar, enhancers are acting in different cells, so I do not wish the redundancy before running the tests. Rather, I seek a way of altering the threshhold in such a way that recognizes that in some case it is the same test being run, it is just that the enhancer is active in 2 (or more) cell types.

Here is some sample output of genomic regions:

9    123683021    123683106
9    123683247    123683365
9    123684113    123684195
9    123684377    123684427
9    123686862    123696953
9    123686862    123700710
9    123687211    123698710
9    123687211    123700338
9    123687211    123707408
9    123687211    123707657
9    123687372    123687432
9    123687834    123687912
9    123688216    123688217
9    123688862    123688862
9    123688862    123691442
9    123688862    123699585
9    123688999    123689022
9    123689204    123689238
9    123689588    123689608
9    123690238    123690261
9    123691041    123691141
9    123691341    123691469
9    123691469    123691469
9    123691601    123691601
9    123691750    123691883
9    123692452    123700338
9    123692868    123692868
9    123692868    123715017
9    123692939    123700496
9    123693079    123699083
9    123693945    123693947
9    123693945    123694466
9    123693945    123694942
9    123693945    123698490
9    123693945    123698890
9    123693945    123699083
9    123693945    123700183
9    123693945    123700891
9    123693945    123704168
9    123693945    123705308
9    123693945    123705462
9    123694366    123694420
9    123694739    123694846

7.9 years ago

I am not familiar with burden tests and the data you have so what I suggest may not apply. Concerning overlapping regions, I would decide on how much overlap is acceptable to call them different enhancers. For example, a 95% overlap is probably within the error of enhancer boundary definition and could be considered the same enhancer but a 5% overlap means the sequences are essentially different and so probably represent different enhancers. Alternatively, you could try merging all overlapping regions into a set of discontinuous regions and test these. Concerning the test, my first thought was that you could replace the Bonferonni correction with a FDR correction. On second thought though, assuming you have data for several cell types for each enhancer, I think I would try some sort of multivariate analysis to identify which enhancers are active then do a post-hoc analysis to find out in which cells they're active.