Question

Skewed beta-distribution from Methylation EPIC array data

0

Entering edit mode

4.2 years ago

erwin.tomasich • 0

Dear all, I'm a rookie in the field of bioinformatics. We recently started to work with Methylation EPIC array data from Illumina.

For our last 4 chips we received weird (skewed) beta distribution densities even after within-array normalization using the ChAMP package. Has anyone ever seen something similiar and have an advice?

Your help is very much appreciated!

Kind regards, Erwin

enter image description here

metylation EPIC Illumina ChAMP • 1.8k views

ADD COMMENT • link updated 4.2 years ago by Charles Warden 8.2k • written 4.2 years ago by erwin.tomasich • 0

1

Entering edit mode

Cross-posted: https://support.bioconductor.org/p/127944/

ADD REPLY • link 4.2 years ago by Kevin Blighe 87k

0

Entering edit mode

I'd expect some peak in the middle for humans - these are imprinted genes. Or the question is about something else?

ADD REPLY • link 4.2 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

I imagine the question is "why is the normalization making the signals vastly less comparable?", which I have no good answer to. Honestly, the before-normalization curves look more reasonable that what the normalization produced.

ADD REPLY • link 4.2 years ago by Devon Ryan 104k

0

Entering edit mode

Just a guess - may be these samples were fundamentally different? Different cell type or cancer? Author mentioned nothing if there are such differences...

ADD REPLY • link 4.2 years ago by German.M.Demidov ★ 2.9k

1

Entering edit mode

That could be, though the post-normalization beta distributions look nothing like either normal or diseased mammalian samples.

ADD REPLY • link 4.2 years ago by Devon Ryan 104k

1

Entering edit mode

Then I'd advise to author to normalize with alternative method (eg rnbeads, it is impossible to make a mistake there) and compare results - I agree, these plots does not look all right

ADD REPLY • link 4.2 years ago by German.M.Demidov ★ 2.9k

score 1 · Answer 1 · 2020-01-30

I think the comments pretty much answer the question, but I would recommend using a different normalization. Sometimes, popular normalization can cause problems that would be obvious upon visual inspection of the density distributions (like you have shown). So, you should try to figure out what works best for your specific dataset.

I often use the most basic normalization, from either GenomeStudio (where you can potentially filter more probes with the detection p-values) or minfi (which includes multiple pre-processing methods, including preprocessIllumina(), although I think the probe filtering may be slightly different/worse than exported and re-formatted GenomeStudio beta values for certain datasets?).