Tutorial: Importance of Array Quality Control - arrayQualityMetrics (PART I)
1
gravatar for majuang66
3.0 years ago by
majuang6680
majuang6680 wrote:

Today's topic is the importance of array quality control. Many articles and websites related to microarray analysis have demonstrated that array quality control is privotal process involved in improvement of identification of differential gene expression. I introduce one example associated with array quality through arrayQualityMetrics.

# Open R
# Import ArrayExpress data (E-GEOD-3419) into R
source("http://bioconductor.org/biocLite.R")

biocLite("ArrayExpress")

library(ArrayExpress)

AEset<-ArrayExpress("E-GEOD-3419")

# Normalization of E-GEOD-3419
library(affy)

AEsetnorm<-rma(AEset)

# Array Quality Control through arraQualityMetrics

biocLite("arrayQualityMetrics")

library(arrayQualtiyMetrics)

# arrayQualityMetrics of AEset (RAW data of E-GEOD-3419)
arrayQualityMetrics(expressionset=AEset,outdir="Report_for_E-GEOD-3419",force=TRUE,do.logtransform=TRUE)

# arrayQualityMetrics of AEsetnorm (normalized data of E-GEOD-3419) - rma normalization convert expression value into log-transformed expression value, so we do ont input "do.logtransform".
arrayQualityMetrics(expressionset=AEsetnorm,outdir="Report_for_nE-GEOD-3419",force=TRUE)

Comparison AEset with AEsetnorm

Array metadata and outlier detection overview - These tables showed change of outlier arrays before and after normalization. - Through these tables and plots, I identified outlier arrays (#12 and #16 array) and analyzed DEGs in all arrays and subsets removed outlier arrays through limma

DEGs analysis (limma) at AEsetnorm dataset removed outlier arrays

library(Biobase)

library(limma)

fvarLabels(AEsetnorm)<-make.names(fvarLabels(AEsetnorm))

gsms<-"01000101110X110X" **# first X is array # 12 and second X is array #16**

sml<-c()

for(i in 1:nchar(gsms)){sml[i]<-substr(gsms,i,i)}

sel<-which(sml !="X")

sml<-sml[sel]

gset<-AEsetnorm[,sel]

sml<-paste("G",sml,sep="")

fl<-as.factor(sml)

gset$description<-fl

design<-model.matrix(~ description + 0,gset)

colnames(design)<-levels(fl)

fit<-lmFit(gset,design)

cont.matrix<-makeContrasts(G1-G0, levels=design)

fit2<-contrasts.fit(fit,cont.matrix)

fit2<-eBayes(fit2, 0.01)

tT<-topTable(fit2, adjust="fdr",sort.by="B",number=250)

class(tT)

'data.frame':   250 obs. of  6 variables:
 **$ logFC**    : num  1.58 -1.6 -2.16 1.67 -3.12 ...
 **$ AveExpr**  : num  5.95 7.35 8.25 8.52 8.74 ...
 **$ t**        : num  12.49 -9.36 -9.1 9.06 -8.87 ...
 **$ P.Value**  : num  1.87e-09 9.66e-08 1.40e-07 1.48e-07 1.96e-07 ...
 **$ adj.P.Val**: num  4.18e-05 8.25e-04 8.25e-04 8.25e-04 8.74e-04 ...
 **$ B**        : num  10.62 7.62 7.31 7.27 7.04 ...

tT<-subset(tT,select=c("AveExpr","adj.P.Val","P.Value","t","B","logFC"))

write.table(tT,file="GSE3419.txt",row.names=T,sep="\t")

DEGs analysis (limma) at AEsetnorm dataset (all arrays - 16 arrays)

fvarLabels(AEsetnorm)<-make.names(fvarLabels(AEsetnorm))

gsms_1<-"0100010111001100"

sml_1<-c()

for(i in 1:nchar(gsms_1)){sml_1[i]<-substr(gsms_1,i,i)}

sml_1<-paste("G",sml_1,sep="")

fl<-as.factor(sml_1)

AEsetnorm$description<-fl

design<-model.matrix(~ description + 0,AEsetnorm)

colnames(design)<-levels(fl)

fit1<-lmFit(AEsetnorm,design)

cont.matrix<-makeContrasts(G1-G0,levels=design)

fit2<-contrasts.fit(fit1,cont.matrix)

fit2<-eBayes(fit2,0.01)

tT<-topTable(fit2,adjust="fdr",sort.by="B",number=250)

tT<-subset(tT,select=c("AveExpr","adj.P.Val","P.Value","t","B","logFC"))

write.table(tT,file="GSE3419_2.txt",row.names=T,sep="\t")

Venndiagram of 2-fold change DEGs between AEsetnorm dataset removed outlier arrays and AEsetnorm all arrays

Up-regulated Genes Gene 0.5<DEGs<2 Down-regulated Genes

I identified the clear difference before and after outlier removal. In conclusion, I think that outlier removal is pivotal process to find DEGs correctly.

tutorial R gene • 1.8k views
ADD COMMENTlink modified 3.0 years ago by genomax68k • written 3.0 years ago by majuang6680

Thank you majuang66,helpful and nice Tutorial

ADD REPLYlink written 3.0 years ago by Shamim Sarhadi210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1379 users visited in the last hour