I'm working with the LINCS 1000 perturbation microarray expression data (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70138). Essentially, it's a bunch of cell lines with expression profiles under a variety of drug treatment conditions.
I've been able to parse the data that I need, and I must determine if log transformed probe intensities pass a noise / detection filter. I've created software to aid in this task (https://github.com/mforde84/microarray_filter). The way I've approached this is to collect log transformed signal intensities for control AFFX probes for each sample, then use the 95% percentile of this subset as a cutoff.
My question is, which control probes should I be using to most accurately filter out noise? The control probes appear to cluster into three ranges (i.e., 3-5, 5-9, 10-15) for log signal intensity, and represent different types of probe targets. For example, some seem to target predicted intronic regions for E.coli, whereas others target housekeeping genes in humans (e.g., GAPDH). I'm concerned that aggregating all of the controls is inappropriate since it would be incorporating actual signal for what appear to be positive control, whereas the alternative approach of selecting subsets of control probes feels like I'm just cherry picking probes. The probes in question are listed below:
probe target log2
AFFX-DapX-3_at CONTROL 6.0301
AFFX-DapX-5_at CONTROL 5.4277
AFFX-DapX-M_at CONTROL 5.4977
AFFX-HSAC07/X00351_3_at ACTB 13.783
AFFX-HSAC07/X00351_5_at ACTB 14.8095
AFFX-HSAC07/X00351_M_at ACTB 13.5602
AFFX-hum_alu_at CONTROL 13.9687
AFFX-HUMGAPDH/M33197_3_at GAPDH 13.7048
AFFX-HUMGAPDH/M33197_5_at GAPDH 13.2631
AFFX-HUMGAPDH/M33197_M_at GAPDH 12.8602
AFFX-HUMISGF3A/M97935_3_at STAT1 8.6337
AFFX-HUMISGF3A/M97935_5_at STAT1 8.0191
AFFX-HUMISGF3A/M97935_MA_at STAT1 8.6988
AFFX-HUMISGF3A/M97935_MB_at STAT1 9.2084
AFFX-HUMRGE/M10098_3_at CONTROL 11.7783
AFFX-HUMRGE/M10098_5_at CONTROL 13.6797
AFFX-HUMRGE/M10098_M_at CONTROL 12.2474
AFFX-LysX-3_at CONTROL 3.8404
AFFX-LysX-5_at CONTROL 4.5422
AFFX-LysX-M_at CONTROL 4.6521
AFFX-M27830_3_at CONTROL 5.3102
AFFX-M27830_5_at CONTROL 9.6554
AFFX-M27830_M_at CONTROL 7.8523
AFFX-PheX-3_at CONTROL 5.4522
AFFX-PheX-5_at CONTROL 4.9566
AFFX-PheX-M_at CONTROL 4.6067
AFFX-r2-Bs-dap-3_at CONTROL 5.5368
AFFX-r2-Bs-dap-5_at CONTROL 5.568
AFFX-r2-Bs-dap-M_at CONTROL 5.4725
AFFX-r2-Bs-lys-3_at CONTROL 3.9217
AFFX-r2-Bs-lys-5_at CONTROL 4.486
AFFX-r2-Bs-lys-M_at CONTROL 4.3319
AFFX-r2-Bs-phe-3_at CONTROL 3.8972
AFFX-r2-Bs-phe-5_at CONTROL 4.6683
AFFX-r2-Bs-phe-M_at CONTROL 4.365
AFFX-r2-Bs-thr-3_s_at CONTROL 4.3526
AFFX-r2-Bs-thr-5_s_at CONTROL 3.4867
AFFX-r2-Bs-thr-M_s_at CONTROL 3.6264
AFFX-r2-Ec-bioB-3_at CONTROL 8.4213
AFFX-r2-Ec-bioB-5_at CONTROL 8.5667
AFFX-r2-Ec-bioB-M_at CONTROL 8.489
AFFX-r2-Ec-bioC-3_at CONTROL 10.9148
AFFX-r2-Ec-bioC-5_at CONTROL 11.2974
AFFX-r2-Ec-bioD-3_at CONTROL 14.9689
AFFX-r2-Ec-bioD-5_at CONTROL 15
AFFX-r2-P1-cre-3_at CONTROL 15
AFFX-r2-P1-cre-5_at CONTROL 15
AFFX-ThrX-3_at CONTROL 4.3881
AFFX-ThrX-5_at CONTROL 3.7677
AFFX-ThrX-M_at CONTROL 3.3396
AFFX-TrpnX-3_at CONTROL 4.3796
AFFX-TrpnX-5_at CONTROL 4.5164
AFFX-TrpnX-M_at CONTROL 4.5939