Question: I have two questions of spike-in data sets for microarray experiments
0
3.8 years ago by
Avro130
Avro130 wrote:

Hi everyone,

I am reading a book called "Statistics and Data Analysis for Microarrays Using R and Bioconductor". More specifically, I am looking at the limitations of microarrays, and I don't understand this sentence:

"The variance of average chip intensity among spike-in data sets is much lower than those measured in most real-life data sets, casting doubts on the general applicability of these data for developing analytical tools  for highly diverse clinical expression profiles."

I have two questions:

1) I understand that spike-in data sets are control that you include in your sample preparation, but how do they work when you analyze/transform your data?

2) What does the author mean by "the variance of average chip intensity among the spike-in data sets"? I know what variance is. For example, if I have 42 control genes, do I compute the average intensity for all of them for each array and then compute the variance?

Thank you!

microarray • 1.4k views
modified 3.8 years ago by JC7.9k • written 3.8 years ago by Avro130
1
3.8 years ago by
JC7.9k
Mexico
JC7.9k wrote:

1) Spike-in sequences are used to scale properly the intensities among chips. Suppose you have 1 spike-in gene in 2 chips, if one chip have an expression level for this gene as 100 and the second chip as 200, you can scale all values in chip 1 doubling the value or in chip 2 by halves. Of course you have more than one sequence in different concentration, therefore you can adjust your intensity values distributions properly using more sophisticated methods.

2) Yes. But the point is that Spike-in sequences have lower variance than the real genes in your samples, so they are not useful.

Hi! Thank you for your answers! So, this is done so we can compare mRNA expression between different platforms/conditions. It's a all about normalization. I'm sorry, but I don't understand your last sentence about using different concentrations of the same sequence.

So, these spike-ins are only good for normalizing since they do not reflect the full spectrum of gene expression variability, right?

Thank you very much!

No, you have several sequences (each one different) with several known concentrations as Spike-Ins.

And yes, they are good for normalization between samples, real sequences can be more variable.