I'm trying to study and understand how the MAS 5.0 algorithm works. I'm reading this manual http://media.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf. This is the first time that I approach this field and the steps for processing the data are enough difficult and often unclear. The questions that I'm wondering are several and depend on the part of algorithm. Anyway, I would clarify the following points:
1) What are the main steps (theoretically) that MAS 5.0 algorithm performs?
2) The processing of MAS 5.0 required also to perform the "detection call" step or the detection call is another procedure for processing the data? Why use mas5.0 or detection call? (bioconductor affy mas5 and mas5call) The detection call can be used to process the data coming from a single array and also from multiple array? In the manual when the comparative analysis (comparison calls) is explained they referred to baseline array but I didn't understand if it is a reference array or it can be any other of my experiments that I want compare. I'm aware that I'm very confuse about this topic I hope that someone can help me!
3) If I want compare different microarray samples, the mas5 algorithm must be applied over all samples together or one sample at time? I supposed that the normalization procedure depend of number of different arrays (.cel files). The real question that I'm wondering if I consider a series matrix (from GEO database) containing samples processed with mas5 the probe sets values are comparable between samples? In other words a value equal to 1250 in sample 1 correspond to the same expression level of a probe set with a value of 1250 in sample 2?
4) Comparing mas5 and RMA algorithm I know that one of differences is that the expression values from RMA are trasformed in log2-scale while expression values from mas5 represent absolute values. I'm trying to extract information from a GEO study processed with mas5 (this study doesn't provide the .cel files so I can't re-process the data with another algorithm). In order to observe the differential expression between two conditions I must to apply some test but before I plotted the expression values distribution of each sample and I noticed that the shape of distribution is exponential. Most of probe set values have a low value close to 0 this involves that the distribution is difficult to interpret, I can't apply a classical test such as z-score or t-test since these suppose a normal distribution, I'm not sure about this conclusion correct me if I am wrong. Finally, I'm wondering if I can transform the data in log2 -scale in order to redistribute the values in another range in this way the distribution assumes a normale shape and then I can proceed with some test.
Sorry for the long questions. Can someone help me?
Thanks for your availability.