I am new to microarray data analysis. What I hope to get is the estimates of absolute expression value from microarray data in ArrayExpress or GEO. For example, from this data: http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-11041 where I can get raw data as well as processed data. I looked at the CALIB package in BioConductor but I can't figure out how to make the arrayexpress data into the formats that CALIB expects. Has anybody done this before? Many thanks!
Microarrays won't get you absolute expression values, just relative values. These can be compared to each other resulting in fold changes (e.g. gen 1 is upregulated 2-fold in sample A compared to sample B).
If you want to process the raw data by your own, a good starting point is the Bioconductor manual:
You could rely on the processed data and calculate the fold changes/p-values with e.g. limma. Or use tools like GEO2R:
Just to be clear, you can get raw intensity values (which is the processed data that you are talking about). These should correlate with absolute expression, but they are directly measures of hybridization from an image processing program (although, to be honest, even RNA-Seq counts are also only correlated with the true transcript abundances in the cell). I'm not familiar with the CALIB package, but you can probably find a Bioconductor package that is able to process the raw data (at least for common platforms like Affy human/mouse arrays...not sure about this E coli array).
More generally, you will have to use some sort of relatively arbitrary cutoff if you want to distinguish as gene as "expressed" or "not expressed". Here are some possible suggestions:
1) Create an MA plot and choose a cutoff where the variability appears to substantially decrease
2) Look at the sample distribution signal. Is there anything like an inflection point in the signal or evidence of a bi-modal expression pattern? This paper provides an example of the first option:
3) Look in the literature for studies that have used that array design. For example, an RMA expression value of 2 or 3 on an Affy array is probably just noise.