Question: Qc And Preprocessing Gene Expression Microarrays - When Is A "Data Set" A Single Data Set?
gravatar for Evansa
7.4 years ago by
Evansa0 wrote:

Hello everyone,

I am new to microarray analysis. I am attempting to do basic QC for gene expression data obtained from Affymetrix Hg-U133a chips, before exploring different methods of normalization.

I been working through various tutorials online and have been looking at output from Bioconductor's simpleaffy package qc function, and affyPLm's fitPLM function.

My data set contains some poor quality image files, raw probe intensity density plots reveal a number of outliers, and a number of Cel files fail QC metrics such as 3'/5' beta actin and GAPDH ratios, etc.

I wish to exclude such files from downstream analysis.

However, from Gentleman, R. et al.2005. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer. and Gentleman, R. 2007. Some Quality methods for affymetrix microarrays. I see that RNA degradation plots and NUSE values are not comparable across data sets. I guess this applies to other metrics.

How loosely can "data set" be applied?

My "data set" consists of 150 CEL files, from independent samples, processed and run on numerous dates over the course of a several years, from a single study.

Is it valid to perform these QC procedures treating my CEL files as a single data set? If not, are there any valid methods of performing QC in such situations?

Thanks in advance for any assistance.

gene bioconductor qc microarray • 3.0k views
ADD COMMENTlink written 7.4 years ago by Evansa0
gravatar for boczniak767
7.4 years ago by
boczniak767630 wrote:

Hi Evansa,

I think "dataset" (also in view of Gentelman's article) is simply collection of data obtained in a given experiment. I.e. from arrays which was hybridized in short period of time (i.e. using the same packages of reagents, and the same conditions (everything which can be imagined to change in next experiment)).

I'd advise you to remove outliers.[?] You can also check if your data display batch effect.

HTH Maciej

ADD COMMENTlink written 7.4 years ago by boczniak767630

Hi Maciej,

Thank you for your reply.

This is as I had feared. The arrays were processed over several years and forty runs.

Are there QC strategies that one can apply to such data in order to remove outliers, as NUSE plots and density histograms can't validly be applied to my data as a whole?

I was intending to look for batch effects downstream of QC and normalization, but am not sure if analysis of these CEL files is feasible, or if it is something that should be attempted?

I am doubtful, but hopeful. I would be grateful for advice, even if that advice is that analysis is not achievable

ADD REPLYlink written 7.4 years ago by Evansa0

If the data looks good within runs (which probably corresponds to replications) you can try to specify batch as as blocking factor in your analysis.

ADD REPLYlink written 7.4 years ago by boczniak767630
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1916 users visited in the last hour