Question

Why is this interaction result not significant in DESeq2

0

Entering edit mode

4 months ago

Guillermo ▴ 10

I was exploring my DE results from DESeq2 and noticed that some of my DEGs seemed to be particularly enriched in groups with a particular combination of factors. Therefore, I decided to carry out an interaction analysis using the following code (I include the gene Hspa1a as an example of a gene higher in samples with XY chromosomes and testes):

# Define formula including interaction term
formula <- as.formula("~chromosomes + gonads + chromosomes:gonads")

dds <- DESeqDataSetFromMatrix(countData = round(counts_filtered),
                              colData = metadata_filtered,
                              design = formula)

# PRemove genes with all samples <10 counts")
toRemove <- rowSums(counts(dds) < 10) == ncol(dds) # ncol(dds) equals N.samples
dds <- dds[!toRemove,]

### DE analysis

dds <- DESeq(dds)

# From metadata:
# chromosomes: Factor with 2 levels: "XX", "XY"
# gonads: Factor with 2 levels: "ov", "te"

# Adapted from ?results:

# the gonad effect for chromosomes XX (the main effect)
results(dds, name="gonads_te_vs_ov") %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")

# # the gonad effect for chromosomes XY
# this is, by definition, the main effect *plus* the interaction term
# (the extra condition effect in XY compared to genotype XX).
results(dds, list( c("gonads_te_vs_ov", "chromosomesXY.gonadste") )) %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")

# the interaction term, answering: is the gonad effect *different* across sex chromosome complement?
results(dds, name="chromosomesXY.gonadste") %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")

### Plotting
vst_normCounts <- assay(vst(dds, blind=TRUE)) %>% 
  as.data.frame() %>%
  rownames_to_column("symbol") %>% 
  dplyr::filter(symbol == "Hspa1a") %>%
  pivot_longer(!symbol, names_to = "id", values_to = "vst_exp")

# Get the useful metadata columns for this interaction and add to counts
vst_columns <- c("id", "chromosomes", "gonads")

metadata_vst <- metadata_filtered %>%
  rownames_to_column("id") %>% 
  dplyr::select(all_of(vst_columns))

vstPlot_df <- vst_normCounts %>% 
  dplyr::left_join(metadata_vst, by = "id")

ggplot(vstPlot_df, aes(x = chromosomes, y = vst_exp, color = gonads)) +
  geom_point(size = 3) +
  scale_colour_discrete(name = str_to_title(colnames(vstPlot_df)[5])) +
  ggtitle(vstPlot_df$symbol[1]) +
  xlab("Sex chromosome complement") +
  ylab("Normalised expression") +
  theme_light(base_size = 18) +
  theme(plot.title = element_text(hjust = 0.5))

The results commands output the following:

> results(dds, name="gonads_te_vs_ov") %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")
    gene baseMean log2FoldChange     lfcSE     stat    pvalue      padj
1 Hspa1a 21847.06      0.2173161 0.1878391 1.156927 0.2473022 0.9416609
> 
> # the gonad effect for chromosomes XY
> # this is, by definition, the main effect *plus* the interaction term
> # (the extra condition effect in XY compared to genotype XX).
> results(dds, list( c("gonads_te_vs_ov", "chromosomesXY.gonadste") )) %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")
    gene baseMean log2FoldChange     lfcSE     stat       pvalue        padj
1 Hspa1a 21847.06      0.8390135 0.2028182 4.136777 3.522188e-05 0.008823081

> 
> # the interaction term, answering: is the gonad effect *different* across sex chromosome complement?
> results(dds, name="chromosomesXY.gonadste") %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")
    gene baseMean log2FoldChange     lfcSE     stat     pvalue      padj
1 Hspa1a 21847.06      0.6216974 0.2764394 2.248947 0.02451588 0.9996121

How is it possible that there is such a clear difference in values between gonads for XY samples compared to XX samples but the interaction term is nowhere close to significant? Am I not understanding properly how interactions work? Did I do something wrong in my code, or a larger effect size would be needed for significance? See the plot below:

VST-normalised gene expression by sex chromosome complement and gonads for Hspa1a

DEG DE interaction DESeq2 RNA-seq • 695 views

ADD COMMENT • link updated 4 months ago by yura.grabovska ▴ 820 • written 4 months ago by Guillermo ▴ 10

score 2 · Answer 1 · 2025-04-29

2

Entering edit mode

4 months ago

ATpoint 89k

Noise in te of XX group is considerable (over one log2 order), meaning more than a 2-fold change within a group, so that is probably at least one of the reasons.

ADD COMMENT • link 4 months ago by ATpoint 89k

score 1 · Answer 2 · 2025-04-30

Your lfcSE is too high which means your stat is low. From the comparison "gonads_te_vs_ov":

stat = log2FoldChange / lfcSE

1.15692685921 = 0.2173161 / 0.1878391

The probability for a stat value of around 1.16 is 0.8770 based on a normal distribution table.

With this information you calculate the p-value as such:

1 - 0.8770 = 0.123

0.123 * 2 = 0.246

This is consistent with the p-value you are quoted by DESeq2.

Basically, your standard error is too high because the data is noisy.