I was wondering if someone could comment on, or point me in the right direction of, considerations when normalising Illumina 450k Methylation data when there are large differences in global methylation status? The experiment I have is where the same cell line is used across 24 samples but there are different treatments and timepoints. One of the treatments is with decitabine, for example, which results in a marked global demethylation seen as a leftward shift in the global beta value profile - see Figure 5A below for an example.
I have read around the area a bit, and much of the literature is concerned with exploring differences between cancer/normal or different tissues. This guide from Brent Pedersen was particularly helpful: https://github.com/brentp/450k-analysis-guide
The minfi vignette and associate papers were also useful, but the thing that struck me was the comment in the Dedeurwaerder et al 2014 review also quoted in the minfi Functional Normalisation paper:
There is to date no between-array normalization method suited to 450K data that can bring enough benefit to counterbalance the strong impairment of data quality they can cause on some data sets
So, am I best off just doing a bare minimum within-array normalisation using, say, the
preprocessRaw function in minfi and not doing any between-array normalisation at all?
Any comments gratefully received.
Phil Chapman, CRUK Manchester Institute
Thanks very much for the reply Jean-Philippe. From my reading I thought your method would be appropriate, so I compared the mds and density plots of the same dataset either completely un-normalised, or after running
preprocessFunnorm(). The groups seem to cluster tighter after noob but spread out again after funnorm, the shape of the density plot changes too. I wasn't quite sure how to interpret this so it would be great to hear your thoughts.
Please see a report on RPubs here with some more detail - http://www.rpubs.com/chapmandu2/91237
Thanks again, Phil
I just looked at your RPubs report (pretty nice!) -- it seems indeed that you've got tighter clusters with noob. In my experience, when the sample size is small (n=19 in your study, correct?), noob by itself performs the best. However, you might want to try
preprocessFunnorm()with different number of with (nPCs =1, 2 ,... 5). Otherwise, I would use
preprocessFunnorm()with the following parameters:
preprocessNoob()and performs a quantile normalization on the Y chromosome by sex.
Hope this helps!
Thanks again Jean-Philippe that's really useful insight. There are actually 24 samples, it's quite difficult to see with filled circles in the plot. I also found your BioC 2014 tutorial for minfi which gave some information on the QC features, it seems that two of my samples fell below the expected line but not by much. I'll try excluding these from the analysis.
A further question I do have is how you would go about looking for differentially methylated regions when you have such a significant global demethylation. What I'd be looking for in a sense is any regions that are differentially methylated more or less than the global shift. Do you think bumphunter could be used in this context in some way? I'm imagining you might add a constant or something to the model?
this is a good reminder that we need to update the vignette of minfi (it is more than outdated). The QC line was defined using blood samples with no global hypo/hyper methylation, and therefore is not relevant for your study -- for instance most of tumor samples fall below this line in my experience.
For the DMR analysis, I don't think there is a general answer to your question. It is hard to define what is the global shift between your samples, since the global shift could be a combination of several small regions with large shifts or/and large regions of hypomethylation ("hypomethylation blocks") etc. You might want first to see if there are large blocks of hypomethylation between your different treatments. In the devel version of minfi, there is a piece of code to do that: https://github.com/kasperdanielhansen/minfi/blob/master/R/blocks.R
If you find blocks, then I would run bumphunter and see if you get DMRs outside of those blocks (you probably will).
Great thanks, I'll take a look at that code. I just don't want to do a standard DMR analysis because I have a sense that everything will change!! Re the minfi vignette it would be really useful to have some additional insight on normalisation approaches. I didn't use minfi initially for my analysis (used lumi instead) simply because it wasn't too clear to me what a sensible approach was, whereas lumi seemed a bit more explicit. The information was there of course, I just had to read the papers, which I subsequently did, but still it would be really useful to give some sort of sensible overview and advice for what sort of normalisation to use when (perhaps using some of the comments here).
Happy to contribute/comment further on the documentation side if it helps - I can't write good enough R to develop packages but I can do documentation... :)