I have two questions of spike-in data sets for microarray experiments
1
0
Entering edit mode
7.3 years ago
Avro ▴ 160

Hi everyone,

I am reading a book called "Statistics and Data Analysis for Microarrays Using R and Bioconductor". More specifically, I am looking at the limitations of microarrays, and I don't understand this sentence:

"The variance of average chip intensity among spike-in data sets is much lower than those measured in most real-life data sets, casting doubts on the general applicability of these data for developing analytical tools for highly diverse clinical expression profiles."

I have two questions:

1. I understand that spike-in data sets are control that you include in your sample preparation, but how do they work when you analyze/transform your data?
2. What does the author mean by "the variance of average chip intensity among the spike-in data sets"? I know what variance is. For example, if I have 42 control genes, do I compute the average intensity for all of them for each array and then compute the variance?

Thank you!

microarray • 2.1k views
1
Entering edit mode
7.3 years ago
JC 13k
1. Spike-in sequences are used to scale properly the intensities among chips. Suppose you have 1 spike-in gene in 2 chips, if one chip have an expression level for this gene as 100 and the second chip as 200, you can scale all values in chip 1 doubling the value or in chip 2 by halves. Of course you have more than one sequence in different concentration, therefore you can adjust your intensity values distributions properly using more sophisticated methods.
2. Yes. But the point is that Spike-in sequences have lower variance than the real genes in your samples, so they are not useful.
0
Entering edit mode

Hi! Thank you for your answers! So, this is done so we can compare mRNA expression between different platforms/conditions. It's a all about normalization. I'm sorry, but I don't understand your last sentence about using different concentrations of the same sequence.

So, these spike-ins are only good for normalizing since they do not reflect the full spectrum of gene expression variability, right?

Thank you very much!

0
Entering edit mode

No, you have several sequences (each one different) with several known concentrations as Spike-Ins.

And yes, they are good for normalization between samples, real sequences can be more variable.