I post "Tutorial: Importance of Array Quality Control - arrayQualityMetrics (PART I), Importance of Array Quality Control - arrayQualityMetrics (PART I)". I have analyzed the Rembrandt Data (Brain tumor), to date, to find new insight of Glioma. This post is a part of my analysis related to Brain tumor. In conclusion, this result indicated that in clustering analysis Rembrandt data showed gender-specific gene expression pattern using 43 genes through unspecific gene filtering.
# Access linux server # access the folder saved Rembrandt Glioma Data $ cd Rembrandt_Glioma - 580 microarrays consist of astrocytoma, oligodendroglioma, normal, GBM, un-known $ R # to access to R program
# Rembrandt Data Import into R library (affy) mydata <-ReadAffy() # Multiple-array Normalization mydata_rma<-rma(mydata) # Array Quality Control through arrayQualityMetrics library(arrayQualityMetrics) # arrayQualityMetrics of mydata arrayQualityMetrics(expressionset=mydata,outdir="Report_for_Rembrandt_RMA",force=TRUE,do.logtransform=TRUE) # arrayQualityMetrics of mydata_rma arrayQualityMetrics(expressionset=mydata_rma,outdir="Report_for_nRembrandt_RMA",force=TRUE) write.table(mydata_rma,file="Rembrandt_RMA_QC.txt",sep="\t", quote=FALSE, row.names=TRUE, col.names=TRUE)
I removed outlier 31 of 580 samples through arrayQualityMetrics packages in Excel program. After edit of the file, which is saved as tab-deliminated file.
# Next, I filtered genes using genefiltering and saved at local computer. mydata<-read.table(file="Rembrandt_RMA_QC.txt",sep="\t", row.names=1,header=T) # Genefiltering using standard deviation library(genefilter) rsd <- rowSds(mydata) # Standard Deviation for row (features) more than 2 i<-rsd>=2 mydata_filtered<-mydata[i,] # 43 genes were selected write.table(mydata_filtered,file="Rembrandt_RMA_QC_filtered.txt",sep="\t", quote=FALSE, row.names=TRUE, col.names=TRUE) # Next, I performed the clustering tendency assessment of the above dataset (The clustering tendency assessment determines whether a given dataset contains meaningful clusters(1)). install.packages ("clustertend") library(clustertend) set.seed(12345) hopkins(mydata_filtered, n=nrow(mydata_filtered)-1,byrow=T, header=T) # mydata_filtered: variable is samples and object is genes
$H value : 0.2712307 (If the value of Hopkins statistic is close to zero, then we can reject the null hypothesis and conclude that the dataset D is significantly a clusterable data (1))
mydata_filtered_1<-t(mydata_filtered) hopkins(mydata_filtered_1, n=nrow(mydata_filtered_1)-1,byrow=T, header=T) # mydata_filtered_1: variable is genes and object is samples
$H value : 0.288575 (If the value of Hopkins statistic is close to zero, then we can reject the null hypothesis and conclude that the dataset D is significantly a clusterable data (1))
(1) Accessing Cluster Tendency: A vital issue - Unsupervised Machine Learning (http://www.sthda.com)