Hello everyone , i am using this dataset. It contains 7 samples , the 6 patients has a type of lymphoma and the 7th has a myeloma and i am following this bioconductor workflow as a guide for my analysis. My current problem is the normalization step. In the guide , in order to normalize the data they use the preprocessQuantile minfi function and the result is very good (check the workflow for an image of the data before and after normalization). However when i use this function to normalize my dataset, the result is this. I searched pubmed and found out that there are many different normalization methods , but i am not able to decide which is best overall (or best-suited for my dataset). I could theoretically try everything but i am not that familiar with R to set my data the way each package wants in order to use every different method. I would appreciate it if someone would help me overcome this normalization problem. I provide my code and the sample sheet needed below.
library(limma) library(minfi) library(IlluminaHumanMethylation450kanno.ilmn12.hg19) library(IlluminaHumanMethylation450kmanifest) library(RColorBrewer) library(missMethyl) library(minfiData) library(Gviz) library(DMRcate) library(stringr) #set datadir dataDirectory <-"C:/Users/angelosgk2/Desktop/R things/Datasets/E-MTAB-3184raw" #check the contents of the directory list.files(dataDirectory) # get the 450k annotation data ann450k <- getAnnotation(IlluminaHumanMethylation450kanno.ilmn12.hg19) head(ann450k) # read in the raw data from the IDAT files targets <- read.metharray.sheet(dataDirectory, pattern="SampleSheet - Copy.csv") targets # read in the raw data from the IDAT files rgSet <- read.metharray.exp(targets=targets) rgSet # give the samples descriptive names targets$ID <- paste(targets$Sample_Group,targets$Sample_Name,sep=".") sampleNames(rgSet) <- targets$ID rgSet # calculate the detection p-values detP <- detectionP(rgSet) head(detP) # examine mean detection p-values across all samples to identify any failed samples pal <- brewer.pal(8,"Dark2") par(mfrow=c(1,2)) barplot(colMeans(detP), col=pal[factor(targets$Sample_Group)], las=2, cex.names=0.8, ylab="") abline(h=0.001,col="red") title(ylab="Mean detection p-values", line=3.3, cex.lab=1.2) legend("topright", legend=levels(factor(targets$Sample_Group)), fill=pal, bg="white") barplot(colMeans(detP), col=pal[factor(targets$Sample_Group)], las=2, cex.names=0.8, ylim=c(0,0.002), ylab="") abline(h=0.001,col="red") legend("topright", legend=levels(factor(targets$Sample_Group)), fill=pal, bg="white") # remove poor quality samples keep <- colMeans(detP) < 0.05 rgSet <- rgSet[,keep] # normalize the data; this results in a GenomicRatioSet object mSetSq <- preprocessQuantile(rgSet) mSetRaw <- preprocessRaw(rgSet) # visualise what the data looks like before and after normalisation par(mfrow=c(1,2)) densityPlot(rgSet, sampGroups=targets$Sample_Group,main="Raw", legend=FALSE) legend("top", legend = levels(factor(targets$Sample_Group)), text.col=brewer.pal(5,"Dark2")) densityPlot(getBeta(mSetSq), sampGroups=targets$Sample_Group, main="Normalized", legend=FALSE) legend("top", legend = levels(factor(targets$Sample_Group)), text.col=brewer.pal(5,"Dark2")) `