Differential Expression with limma: Contrast and Design Matrix, combinatorial approach valid?
15 days ago
I am struggling to make sense of my differential expression analysis. This is what my contrast matrix looks like: contrast matrix

When I take a look at significantly differentially expressed genes, the results for genotype are way larger than any of the individual group + bacterial exposure conditions. It is approximately 50% of the genes total genes detected. Is this a statistically valid approach to compare the impact of any bacterial exposure and genotype? This seems far too dramatic.

14 days ago

Is this a statistically valid approach to compare the impact of any bacterial exposure and genotype?

At a glance, it seems fine to me. With genotype you compare genotypes accounting for different condition applied to each sample. With the other contrasts you compare genotypes within conditions. I think genotype gives many more genes with low p-value because you pool more samples. If each contrast level has two replicates, genotype is a 8 vs 8 comparison while the other contrasts are 2 vs 2. So assuming the conditions affect gene expression in the same direction, the genotype contrast has more power.

In my opinion, this is a limitation of using p-values to assess differential expression or any other effect. P-values keep going down as your sample size increases without approaching a true value or a biologically meaningful value. The bigger the sample size the more convincingly you reject the null hypothesis unless a gene is completely unaffected by the treatment, but this is rarely the case. For this reason I prefer to work with shrunk log-fold changes.

In practice, I would also look at the normalized expression values to sanity-check that genes picked-up by genotype but not by the other contrasts are sensible rather than being a coding error.


