I am performing differential methylation analysis using the Illumina EPIC v2 array and want to clarify whether I should rely on the raw p-values or the adjusted p-values (e.g., FDR-corrected) to identify significantly differentially methylated CpG sites.
After applying multiple testing correction (e.g., Benjamini-Hochberg), I noticed that many CpG sites have the exact same adjusted p-value. Is this expected behavior?
Additionally, I found very few (or sometimes none) significant sites when using the adjusted p-values. Could this indicate that the effect sizes are small, or is it common in methylation studies with large-scale multiple testing?
Adjusted p-values having the same value (high/non-significant) is common depending on the method used for adjustment. Having few significant adjusted p-values is also not unusual depending on the nature of your experiment and the comparison you are setting up. If you are doing multiple comparisons (comparing multiple probes at the same time) then you should always do multiple testing adjustment and use the adjusted p-values.
You can manually calculate the median probe intensity and the deviation between the groups you are comparing and then see what the difference between each probe is in terms of beta/M-values. This isn't robust but will allow you to explore the effect sizes in an intuitive way.
You should always use adjusted P-values when you are undertaking any analysis which involves multiple statistical tests.
After applying multiple testing correction (e.g., Benjamini-Hochberg), I noticed that many CpG sites have the exact same adjusted p-value. Is this expected behavior?
Yes, the BH procedure can lead to many tests being adjusted to the same adjusted p-value
Additionally, I found very few (or sometimes none) significant sites when using the adjusted p-values. Could this indicate that the effect sizes are small, or is it common in methylation studies with large-scale multiple testing?
There are many possible different explanations for seeing few or significantly different sites. It could be that there is genuniely no biological difference between the conditions. It could be that that there are genuine difference, but the effect sizes are smaller than the noise levels. It could be due to sub-optimal handling of the experimental samples, batch effects in the their processing, or a problem with normalisation or many other things.
just show some rows of your result data frame so others can see and give their insight. you can put