Illumina HumanMethylation 450k dataset normalization
0
1
Entering edit mode
10 months ago
aggelos529 ▴ 10

Hello everyone , i am using this dataset. It contains 7 samples , the 6 patients has a type of lymphoma and the 7th has a myeloma and i am following this bioconductor workflow as a guide for my analysis. My current problem is the normalization step. In the guide , in order to normalize the data they use the preprocessQuantile minfi function and the result is very good (check the workflow for an image of the data before and after normalization). However when i use this function to normalize my dataset, the result is this. I searched pubmed and found out that there are many different normalization methods , but i am not able to decide which is best overall (or best-suited for my dataset). I could theoretically try everything but i am not that familiar with R to set my data the way each package wants in order to use every different method. I would appreciate it if someone would help me overcome this normalization problem. I provide my code and the sample sheet needed below.



library(limma)
library(minfi)
library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
library(IlluminaHumanMethylation450kmanifest)
library(RColorBrewer)
library(missMethyl)
library(minfiData)
library(Gviz)
library(DMRcate)
library(stringr)

#check the contents of the directory

# get the 450k annotation data
ann450k <- getAnnotation(IlluminaHumanMethylation450kanno.ilmn12.hg19)

# read in the raw data from the IDAT files
targets

# read in the raw data from the IDAT files
rgSet

# give the samples descriptive names
targets$ID <- paste(targets$Sample_Group,targets$Sample_Name,sep=".") sampleNames(rgSet) <- targets$ID
rgSet

# calculate the detection p-values
detP <- detectionP(rgSet)

# examine mean detection p-values across all samples to identify any failed samples
pal <- brewer.pal(8,"Dark2")
par(mfrow=c(1,2))

barplot(colMeans(detP), col=pal[factor(targets$Sample_Group)], las=2, cex.names=0.8, ylab="") abline(h=0.001,col="red") title(ylab="Mean detection p-values", line=3.3, cex.lab=1.2) legend("topright", legend=levels(factor(targets$Sample_Group)), fill=pal,
bg="white")

barplot(colMeans(detP), col=pal[factor(targets$Sample_Group)], las=2, cex.names=0.8, ylim=c(0,0.002), ylab="") abline(h=0.001,col="red") legend("topright", legend=levels(factor(targets$Sample_Group)), fill=pal,
bg="white")

# remove poor quality samples
keep <- colMeans(detP) < 0.05
rgSet <- rgSet[,keep]

# normalize the data; this results in a GenomicRatioSet object
mSetSq <- preprocessQuantile(rgSet)
mSetRaw <- preprocessRaw(rgSet)

# visualise what the data looks like before and after normalisation
par(mfrow=c(1,2))

densityPlot(rgSet, sampGroups=targets$Sample_Group,main="Raw", legend=FALSE) legend("top", legend = levels(factor(targets$Sample_Group)),
text.col=brewer.pal(5,"Dark2"))

densityPlot(getBeta(mSetSq), sampGroups=targets$Sample_Group, main="Normalized", legend=FALSE) legend("top", legend = levels(factor(targets$Sample_Group)),
text.col=brewer.pal(5,"Dark2"))

R methylation data normalization • 348 views

Traffic: 2709 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.