I am new to microarray analysis. I am attempting to do basic QC for gene expression data obtained from Affymetrix Hg-U133a chips, before exploring different methods of normalization.
I been working through various tutorials online and have been looking at output from Bioconductor's simpleaffy package qc function, and affyPLm's fitPLM function.
My data set contains some poor quality image files, raw probe intensity density plots reveal a number of outliers, and a number of Cel files fail QC metrics such as 3'/5' beta actin and GAPDH ratios, etc.
I wish to exclude such files from downstream analysis.
However, from Gentleman, R. et al.2005. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer. and Gentleman, R. 2007. Some Quality methods for affymetrix microarrays. http://www.bioconductor.org/help/course-materials/2007/biocadv/Labs/AffyQuality/AffyQuality.pdf I see that RNA degradation plots and NUSE values are not comparable across data sets. I guess this applies to other metrics.
How loosely can "data set" be applied?
My "data set" consists of 150 CEL files, from independent samples, processed and run on numerous dates over the course of a several years, from a single study.
Is it valid to perform these QC procedures treating my CEL files as a single data set? If not, are there any valid methods of performing QC in such situations?
Thanks in advance for any assistance.