Question: Figure out outliers in microarray dataset with z-scores
0
gravatar for arronar
2.7 years ago by
arronar200
Austria
arronar200 wrote:

Hi.

I'am reading a book on DNA microarray data analysis and I'm trying to follow it analyzing a dataset of mine.

So until now I created boxplots, MAplots and correlation maps (using spearman's and concordance correlation coefficient methods) before and after the normalization (I used quantile normalization)

Now i would like to find out the outliers (some of them are shown in boxplots) and the book I'm reading suggests the resistant z-score

And we can call that value (X_gi) outlier if the z_gi is greater than 5 for example. Is such approach acceptable ? What do you think and what are you using in such cases ?

Any other idea on how to proceed on that step is welcomed.

Thank you.

microarray outliers z-score • 1.0k views
ADD COMMENTlink modified 2.6 years ago by theobroma221.1k • written 2.7 years ago by arronar200
1
gravatar for theobroma22
2.6 years ago by
theobroma221.1k
theobroma221.1k wrote:

If it's in a book I would think it's acceptable to use resistant z-score. Another way to view outliers is the Q-Q plot.

ADD COMMENTlink written 2.6 years ago by theobroma221.1k

You have used Q-Q plots for microarray outliers ? I thought that Q-Q plots with such big data are not so useful. What about the PCA ? Have you ever used it as such tool ?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by arronar200
1

You can use PCA but it's generally not used to find outliers. And yes, I've used QQ for microarray analysis.

ADD REPLYlink written 2.6 years ago by theobroma221.1k

Could you please suggest me some resources (paper/tutorial/link) for the Q-Q technique that you used?

ADD REPLYlink written 2.6 years ago by arronar200

The QQ technique used is the typical QQ technique. You qqplot the residuals or the p-values of the linear model of the microarray analysis, like control v. Tmt or timecourse analysis. Then, those that off the line at either end are your outliers. Or do you mean R code?

ADD REPLYlink written 2.6 years ago by theobroma221.1k

Thanks. It would be helpful to see a code example too.

ADD REPLYlink written 2.6 years ago by arronar200
1

Ok, please give me some time to provide this to you. In the meantime I found this slide deck that also has R code in it. http://compdiag.molgen.mpg.de/ngfn/docs/2005/sep/beissbarth_cDNA_QCPP.pdf A QQ plot is also show. I'll send you a sample R code to do the qqplot soon. Thanks.

ADD REPLYlink written 2.6 years ago by theobroma221.1k
0
gravatar for theobroma22
2.6 years ago by
theobroma221.1k
theobroma221.1k wrote:
set.seed(329)
ctl = matrix(rnorm(10000, mean=100, sd=1), ncol = 10, nrow = 1000))
set.seed(923)
tmt = matrix(rnorm(10000, mean=200, sd=1), ncol = 10, nrow = 1000)) 
Xmat = as.matrix(cbind(ctl, tmt))
colnames(Xmat) = c(1:20)
theData=data.frame(THIS=rep(c("Ctl","Tmt"), each=10))
pd=new("AnnotatedDataFrame", data=theData)
exp = ExpressionSet(Xmat, phenoData=pd)
exp

 #ExpressionSet (storageMode: lockedEnvironment)
 #assayData: 1000 features, 20 samples 
 #  element names: exprs 
 #protocolData: none
 #phenoData
 #  sampleNames: 1 2 ... 20 (20 total)
 #  varLabels: THIS
 #  varMetadata: labelDescription
 #featureData: none
 #experimentData: use 'experimentData(object)'
 #Annotation:  

design=model.matrix(~factor(exp[["THIS"]]))
fit = lmFit(exp, design)
fit = eBayes(fit)
qqt(fit$t, df=fit$df.prior+fit$df.residual, pch=16, cex=0.5) #A Student's t Q-Q plot
abline(200,330)

This was of course done using dummy data so the QQ plot looks a bit silly. See the marray and limma bioconductor package (pg. 83) for more details.

ADD COMMENTlink written 2.6 years ago by theobroma221.1k
0
gravatar for theobroma22
2.6 years ago by
theobroma221.1k
theobroma221.1k wrote:

Principal Components Analysis

dim(Xmat)
X2mat = log(Xmat, 2)
X2mat = scale(X2mat, center = TRUE)
PCmat = t(X2mat)
dim(PCmat)
mat = data.frame(PCmat)
mat$X1001 = rownames(PCmat)
dim(mat)
pc2v = prcomp(mat, center = FALSE, scale. = FALSE)
plot(pc2v, type="l")

See https://tgmstat.wordpress.com/2013/11/28/computing-and-visualizing-pca-in-r/ for more information

ADD COMMENTlink written 2.6 years ago by theobroma221.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1296 users visited in the last hour