I am trying to do differential expression of microarrays with limma package. I usually compare two groups. However, when each group has only one sample, it failed when I run to the syntax eBayes. My code is got from the GEO2R. When one group contain more than one sample, it works. Does anyone know why is that? How to do differential expression between only two samples? Thank you.
This goes down to the basics of statistics. Though I am not a stats-expert I will try to clarify the issue a bit.
For each gene that was measured by your microarrays you are asking limma whether the mean gene expression of sample 1 is different from the mean gene expression of sample 2 and whether the difference in these means is not explained by the random difference you could expect given the random variance you get from microarray profiling in the first place. However, as you have only 1 sample in each of the two groups, the mean of each group will just be the same as the gene expression of the samples, and the variance of each group cannot be estimated because you only have 1 sample in each group.
Compare it with the situation where you ask whether 1 person called Peter is significantly taller than 1 person called Susan. He is just bigger, you cannot do any statistics on that. If you ask whether men in general are bigger than women, and you have measured 20 males and 20 females you can take the mean of both groups and the variance within the groups and ask whether the difference in mean length between the males and females is significant.
I hope someone can explain it in more statistically-sound terms
In order to apply a t-test between two samples (which is, at the end of it, what limma is doing - albeit in a fancy way) we need to know three things about the two samples.
- The mean
- The sample size
- The variance (or more specifically the standard deviation)
In your case you have an estimate of the mean (the measurement of expression for a given gene on your one replicate per sample) and the sample size (1), but you have no way of calculating the variance (if we assume the two populations your samples are taken from are independent - a reasonable assumption, given the hypothesis we are testing).
Even if your one measurement gave us a good estimate of the mean (and this is by no means certain - there are many, many reasons why your sample could be an outlier), without the variance we simply have no route to calculate the t-statistic for the tests.
For this fairly simple technical reason, it is impossible to apply limma (or indeed any valid statistical test) in a situation where you have no replicates in at least one of your sample groups. This is just one of the many reasons why replicates are absolutely required when doing microarray (and indeed, RNA-Seq) experiments.