Entering edit mode

7.3 years ago

Avro
▴
160

Hi everyone,

I am reading a book called "Statistics and Data Analysis for Microarrays Using R and Bioconductor". More specifically, I am looking at the limitations of microarrays, and I don't understand this sentence:

"The variance of average chip intensity among spike-in data sets is much lower than those measured in most real-life data sets, casting doubts on the general applicability of these data for developing analytical tools for highly diverse clinical expression profiles."

I have two questions:

- I understand that spike-in data sets are control that you include in your sample preparation, but how do they work when you analyze/transform your data?
- What does the author mean by "the variance of average chip intensity among the spike-in data sets"? I know what variance is. For example, if I have 42 control genes, do I compute the average intensity for all of them for each array and then compute the variance?

Thank you!

Hi! Thank you for your answers! So, this is done so we can compare mRNA expression between different platforms/conditions. It's a all about normalization. I'm sorry, but I don't understand your last sentence about using different concentrations of the same sequence.

So, these spike-ins are only good for normalizing since they do not reflect the full spectrum of gene expression variability, right?

Thank you very much!

No, you have several sequences (each one different) with several known concentrations as Spike-Ins.

And yes, they are good for normalization between samples, real sequences can be more variable.