I am reading a paper of Irizarry 2003 'Exploration, normalization, and summaries of high density oligonucleotide array probe level data' and I am confused about the dataset /number of arrays he used.

*The varying concentration series data set, B1*

For an individual array, all of the 11 control cRNAs were spiked-in at the same concentration and this concentration was varied across arrays, taking the values 0.0, 0.5, 0.75, 1, 1.5, 2, 3, 5, 12.5, 25, 50, and 150 pM. For example, array 1 had all control cRNAs spiked with 0.0 pM and array 2 had all control cRNAs spiked with 0.5 pM, etc. Of these 12 concentrations, 0, 0.5, 0.75, 1, 1.5, 2, 3 were represented on just one array, 5 and 100 on two arrays, and the rest were in triplicate, i.e. on three arrays for a total of 27 arrays.

So, where does the concentration of 100 come from (with 100 it would be 13 concentrations) ? If it was a mistake and he ment 150 insetad of 100 , then we have 7 arrays with unique concetration (0, 0.5, 0.75, 1, 1.5, 2, 3), 4 array in total (2 for 5 pM and 2 for 100) and then for the rest 3 concentrations (we used 9 already, so 12-9=3) we have 9 chips in total. This gives us 7+4+9 = 20 arrays. Where is my mistake ? I do not get how he comes to 27 arrays.

*Latin square series data set, B2*

In this series each of the 11 control cRNAs were spiked-in at adifferent concentration on each array (apart from replicates). So, for one chip I have exactly one control cRNA with all concentrations. Is it right?

The 12 concentrations used were 0.5, 1, 1.5, 2, 3, 5, 12.5, 25, 37.5, 50, 75, and 100 pM, and these were arranged in a 12 × 12 cyclic Latin square, with each concentration appearing once in each row and column. The 12 combinations of concentrations used on the arrays were taken from the first 11 entries of the 12 rows of this Latin square. Of the 12 combinations used, 11 were done on three arrays and one on just one array.

So, first chip gets one combination of concentrations of only one control cRNA ex. 1, 1.5, 0.5,2,3,25,5,12.5,37.5,50,75,100 and another chip gets another combination of the concentrations of the same control cRNA. It is right? In the paper of Bolstad 'A comparison of normalization methods.." he refers to this dataset writing:

The spike-in data series consists of 98 array. 27 arrays are a dilution series.

So, 98-27 = 71 arrays come from Latin square series data set, B2. But how do I get exactly this number?

Thanks a lot.

Have you considered asking the authors? Rafael is usually quite approachable (as is Ben).

