Controlling for FDR in 450k methylation microarray data
1
0
Entering edit mode
13 months ago
eesiribloom ▴ 80

I am analysing illumina 450K methylation data.

I have 79 samples to analyse in total.

I have been completing my analysis using all CpG probes (pre-processed for p-value etc. and probes overlapping known SNPs removed) which takes the total number of probes to ~460,000.

If I use minfi or limma to look for sites that are differentially methylated between two groups - in my case, comparing 13 samples with a mutation of interest to 66 samples without - and I look at sites/DMPs with an adjusted p-value (qvalue) of <0.05 (BH adjusted), according to the package vignette/user guide, is this corrected for multiple testing across sites (i.e. across all 460,000 probes) or across samples? or both?

I have multiple covariates included in my design matrix but only one comparison of interest (samples with or without the mutation of interest) so I am assuming I do not need to correct for multiple contrasts (limma can do this too I believe).

My concern is that a given such a large number of CpG probes, a small number of sites are bound to be differentially methylated between the two groups by chance, and I'm not entirely clear how to interpret the results, as the output (limma toptable) has many different tests and statistics such as f-statistic, pvalue, qvalue and B value etc. It seems that adjusted p value (qvalue) seems to be the one to focus on. What is the explicit interpretation of a DMP with qvalue < 0.05 - would it be 'this site is differentially methylated between samples with and without the mutation of interest when corrected for multiple testing across genes' ?

Is it worth reducing noise and potentially increasing my degrees of freedom by restricting my analysis to e.g. the most 1000 most variable sites, as might be done when analysing DEGs? Although, I haven't seen many examples of this for methylation data.

Cheers

minfi methylation 450k limma epigenetics • 517 views
ADD COMMENT
1
Entering edit mode
13 months ago
LChart 3.9k

It seems that adjusted p value (qvalue) seems to be the one to focus on.

Correct

is this corrected for multiple testing across sites (i.e. across all 460,000 probes) or across samples? or both?

Across sites -- there is 1 test per site; and the number of tests does not change if you modify the number of samples.

What is the explicit interpretation of a DMP with qvalue < 0.05

The correct interpretation of a q-value (FDR) < 0.05 is "Of all of the significant CpG islands with FDR < 0.05, we expect 5% to be false-positives." For a specific CpG island, you can say that it is differentially methylated at a level of 5%, which means that it belongs to a set of identified differentially methylated CpG islands which, together and in expectation, should contain no more than 5% false-positives.

ADD COMMENT

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6