Question: Your Thoughts On A "Standard" Pipeline To Process Illumina 450K Data
gravatar for Neilfws
8.0 years ago by
Sydney, Australia
Neilfws48k wrote:

I've recently started working with the Illumina 450K methylation platform. There are several software packages available to handle this data including methylumi, lumi, minfi (those 3 from Bioconductor) and IMA. I'm disregarding IMA since it requires text files exported from BeadStudio in a particular format (which I don't have) and I prefer to start from IDAT files.

The packages are similar in that they create an R object based on the eSet class, but they all come with different methods for adjusting colour bias and normalizing. I'm finding the number of choices rather confusing. For example:

  • methylumi has a rather basic method, normalizeMethyLumiSet(), which does not seem entirely appropriate for the 450K platform
  • lumi has methods for colour bias correction, background adjustment and normalization; it's not clear to me whether these methods should be applied separately to the type I and type II probes on the 450K platform (and if so, whether I'd then somehow recombine the data)
  • minfi makes no mention of colour bias but has a method in the development version, preprocessSWAN(), which does normalization accounting for differences in type I/type II probes

So my questions are:

  • Which package do you use? Or do you use more than one, in combination?
  • Should I even worry about colour bias adjustment? And if so, should I treat type I and II probes differently? And if so, how?
  • The "best" method, in your opinion, to normalize? Using lumi - ssn or quantile? Or use minfi? Treat colours separately or not? Treat type I/II probes separately or not?

My current feeling is that preprocessSWAN() in the minfi development version is the way to go, but I'd appreciate your thoughts (and especially, your R code).

ADD COMMENTlink modified 6.2 years ago by Charles Warden7.6k • written 8.0 years ago by Neilfws48k
gravatar for Aaron Statham
8.0 years ago by
Aaron Statham1.1k
Aaron Statham1.1k wrote:

Below is my code for using minfi - I get a pearson correlation of 0.95 between beta values from a 450k array and whole genome bisulfite sequencing (cell line) so at least in my situation I don't know how much more there is to gain.

RG.raw <- read.450k.exp(base = slide.folder, targets = files.table)
methyl.norm <- preprocessIllumina(RG.raw, bg.correct = TRUE, normalize = "controls")
beta.table <- getBeta(methyl.norm)
ADD COMMENTlink written 8.0 years ago by Aaron Statham1.1k

Hah at the moment I'm not paid to worry - there are always improvements to be made but a 0.85/0.95 correlation is good enough for me until someone does some serious benchmarking.

ADD REPLYlink written 8.0 years ago by Aaron Statham1.1k

wow! that' great correlation. Is that the norm for 450k?

ADD REPLYlink written 8.0 years ago by brentp23k

Worst I've gotten between 450k and bisulfite seq is 0.85 and that was comparing primary cells (grown for a short time in culture) isolated between two different patients ie patient 1 on 450k, patient 2 on bis-seq.

ADD REPLYlink written 8.0 years ago by Aaron Statham1.1k

This code is straight from the minfi user guide. I tend to agree though, that it is as good as anything. You don't worry about color bias, treating type I/II probes separately or the SWAN method?

ADD REPLYlink written 8.0 years ago by Neilfws48k

Hello! I came across this old post while searching for methylation data analysis. I have data from control samples: one unmethylated and one methylated from both bisulfite sequencing and 450K. ( Ideally unmetylated control samples should have 0% methylation and methylated sample should have 100% but this is certainly not the case) I tried to correlate the results between 450K and sequencing, only including the sites that are present in both 450K and sequencing. I use the percentage of methylation ( beta value in 450K). I did not use any of the above package but got the data straight from Genome Studio.

I got a ~0.88 correlation for the unmethylated control sample, but only 0.07 for the methylated control. Any idea how this could be? Thanks in advance!

ADD REPLYlink written 5.1 years ago by cafelumiere1270
gravatar for Charles Warden
6.2 years ago by
Charles Warden7.6k
Duarte, CA
Charles Warden7.6k wrote:

I guess this is a somewhat old post, but I would recommend using COHCAP for 450k array analysis:

I have used Genome Studio for processing / normalization, and I use COHCAP for QC, differential methylation (for CpG sites as well as CpG islands), and integration with gene expression data (if relevant).

Haven't yet tested the tools described in this link, but I would currently agree that minfi is probably OK for normalization. I think the additional normalization (e.g. SWAN, etc.) has a relatively modest effect. It seems to me that the the beta values used for analysis are less different than the raw intensity values, the relative frequencies of I vs. II are pretty different (see Figure 6 in the SWAN paper), and my personal opinion is that it is best to consider differentially methylated regions rather than individual probes / sites (for example, I think these relatively minor differences should be averaged out across the multiple probes within the CpG island).

Also, I think this comes with an assumption of using p-value / FDR alone or using a delta-beta value as a cutoff. I personally like using a methylated and unmethylated cutoff, so that I preferentially look at regions with methylation values that at least roughly follow the bimodal distribution in beta values, especially for cell line experiments. In other words, if you look for sites / regions where the average beta is > 0.7 in one group and <0.3 in the other group (or >0.3 in one group and <0.3 in the other group), it doesn't really matter as much if some of the probes for some of the unmethylated CpG sites show beta values closer to 0.2 than 0.3 (like in Figure 4C or Figure 6 of the SWAN paper)

SWAN Paper:

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Charles Warden7.6k

I was only looking at this post a few days ago, wistfully wishing that there were better answers. Thanks! I have looked at COHCAP but haven't tried it out yet. There's been an explosion of packages, especially in Bioconductor (e.g. wateRmelon, ChAMP) since I asked this question; I'm still searching for that elusive "standard pipeline" if there even is such a thing.

ADD REPLYlink written 6.2 years ago by Neilfws48k

Ok, cool. Actually, the COHCAP Bioconductor package got approved about a week ago, so I think it should be out relatively soon. Hopefully, that helps with usability.

I happened to stumble upon this discussion because I am starting to put together an Protocol Exchange entry for the Bioconductor version of COHCAP, but I wanted for every step to be open-source (so, I wanted to start with some publicly available .idat files and find a Bioconductor replacement for Genome Studio).

Hopefully, these are things that can be helpful. FYI, RnBeads is another option out there. I personally still like COHCAP the best, but I am obviously biased ;)


ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by Charles Warden7.6k
gravatar for housebie
7.6 years ago by
housebie40 wrote:


I also just started with working on illumina 450K data and just came across this post, which I see, was 5 months ago. Since I am totally new to this area right now, I am trying to figure out the best approach to analyse my 450K data with more than 700 samples. I did come across minfi, lumi, methylumi, and IMA, but I am not quite sure and I have similar questions which you mention here a few months back.

So I just thought of asking you now, as you might have already worked with quite a few things on that by now.

1) Which package did you use? Or did you use more than one, in combination? I am trying to get my hands on "minfi" right now, considering the recent paper about "SWAN" which seems to be one of the good approaches. But I want to know your experience with it and your suggestion.

2) There is another recent paper "" which talks about the complete preprocessing pipeline using an original SQN approach. This paper says that it performs both sample normalization and efficient infinium1/2 shift correction. Has anyone used this? If yes, how do you find it ?

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by housebie40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1104 users visited in the last hour