**20**wrote:

Hi everyone,

I'm recently analyzing DNA methylation data and facing an obstacle problem here:

As we know that the DNA methylation distribution can vary differently in genomic features (core promoter, enhancer, CpGIsland, etc). I want to measure the distribution bias among these genomic features now.

In other words, I want to know the deviation between expected and observed DNA methylation sites number?

I read some papers and found various methods used in this analysis, for example, `independent t-test, Chi-square test, Mann-Whitney U test, permutation test, etc`

, which made me really confused on choosing.

I have tried the `independent t-test`

and calculated the `ratio = log2(mean of observed/mean of expected)`

for plotting heatmap (In this result, if the ratio > 0, I will say DNA methylation occurs more often in this region and *vice verse*). However, someone told me that the `Chi-square test`

may better on measuring the difference between observed and expected. I also tried this too. However, I can only get a chi-value for each genomic feature, which varies a lot (from 300 - 40000000), difficult for visualization.

So, I have several questions:

- Which methods do you think is better for this kind of problem?
- If Chi-square distribution is used, how to properly handle the chi-value for visualization (normalize the chi-square of each region to a random region?)
- I noticed the p-value is typically small (10e-100 often and even 0 reported), I referred some answer on how to handle very large dataset for statistical test and find there are no clear conclusions. So, if you make statistical test on a very large dataset (typically, sample size in the level of 10e6 is usual in bioinformatics), how do you handle the very small p-value?

Thanks for your time, really appreciate any answers!

**20**• written 10 months ago by Houyu Zhang •

**20**