Question

Illumina HumanMethylation 450k dataset normalization

1

Entering edit mode

3.5 years ago

aggelos529 ▴ 10

Hello everyone , i am using this dataset. It contains 7 samples , the 6 patients has a type of lymphoma and the 7th has a myeloma and i am following this bioconductor workflow as a guide for my analysis. My current problem is the normalization step. In the guide , in order to normalize the data they use the preprocessQuantile minfi function and the result is very good (check the workflow for an image of the data before and after normalization). However when i use this function to normalize my dataset, the result is this. I searched pubmed and found out that there are many different normalization methods , but i am not able to decide which is best overall (or best-suited for my dataset). I could theoretically try everything but i am not that familiar with R to set my data the way each package wants in order to use every different method. I would appreciate it if someone would help me overcome this normalization problem. I provide my code and the sample sheet needed below.

`

library(limma) 
library(minfi)
library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
library(IlluminaHumanMethylation450kmanifest)
library(RColorBrewer)
library(missMethyl)
library(minfiData)
library(Gviz)
library(DMRcate)
library(stringr)

#set datadir
dataDirectory <-"C:/Users/angelosgk2/Desktop/R things/Datasets/E-MTAB-3184raw"

#check the contents of the directory
list.files(dataDirectory) 

# get the 450k annotation data
ann450k <- getAnnotation(IlluminaHumanMethylation450kanno.ilmn12.hg19)
head(ann450k)

# read in the raw data from the IDAT files
targets <- read.metharray.sheet(dataDirectory, pattern="SampleSheet - Copy.csv")
targets

# read in the raw data from the IDAT files
rgSet <- read.metharray.exp(targets=targets)
rgSet

# give the samples descriptive names
targets$ID <- paste(targets$Sample_Group,targets$Sample_Name,sep=".")
sampleNames(rgSet) <- targets$ID
rgSet

# calculate the detection p-values
detP <- detectionP(rgSet)
head(detP)

# examine mean detection p-values across all samples to identify any failed samples
pal <- brewer.pal(8,"Dark2")
par(mfrow=c(1,2))

barplot(colMeans(detP), col=pal[factor(targets$Sample_Group)], las=2, 
        cex.names=0.8, ylab="")
abline(h=0.001,col="red")
title(ylab="Mean detection p-values", line=3.3, cex.lab=1.2)
legend("topright", legend=levels(factor(targets$Sample_Group)), fill=pal,
       bg="white")

barplot(colMeans(detP), col=pal[factor(targets$Sample_Group)], las=2, 
        cex.names=0.8, ylim=c(0,0.002), ylab="")
abline(h=0.001,col="red")
legend("topright", legend=levels(factor(targets$Sample_Group)), fill=pal, 
       bg="white")

# remove poor quality samples
keep <- colMeans(detP) < 0.05
rgSet <- rgSet[,keep]

# normalize the data; this results in a GenomicRatioSet object
mSetSq <- preprocessQuantile(rgSet)
mSetRaw <- preprocessRaw(rgSet)

# visualise what the data looks like before and after normalisation
par(mfrow=c(1,2))

densityPlot(rgSet, sampGroups=targets$Sample_Group,main="Raw", legend=FALSE)
legend("top", legend = levels(factor(targets$Sample_Group)),
      text.col=brewer.pal(5,"Dark2"))

densityPlot(getBeta(mSetSq), sampGroups=targets$Sample_Group,
            main="Normalized", legend=FALSE)
legend("top", legend = levels(factor(targets$Sample_Group)),
      text.col=brewer.pal(5,"Dark2")) `

R methylation data normalization • 941 views

ADD COMMENT • link 3.5 years ago by aggelos529 ▴ 10