Question

multiple testing in 450K array data

0

Entering edit mode

12 months ago

eebloom ▴ 80

I am going slightly mad trying to find how people, if at all, deal with such a large number of probes in 450K array data (don't even want to think about EPIC/850K arrays!). Surely when identifying DMRs or DMPs, using (close to) all 450K probes is far too noisy to detect differences.

I am working with a dataset of 79 samples (control group 66, test group 13). Even with preprocessed 450K array data and filtering for probes associated with SNPs etc. I am still left with over 400,000 probes. Right now I am barely picking out any DMPs or DMRs after correcting for multiple testing (FDR<0.05) but my concern is that with so many probes, how could you find much...

I am wondering about using e.g. the 1000 most variable probes based on SD or MAD or even just filtering out low varying probes, but this doesn't seem to be particularly common in the literature for identifying DMPs/DMRs as opposed to say identifying DEGs from expression data.

What is the common/standard way to deal with multiple testing? Is it a good idea to remove probes with low variation?

Any guidance welcome!

minfi methylation 450K epigenetics statistics • 358 views

ADD COMMENT • link 12 months ago by eebloom ▴ 80