Why is this interaction result not significant in DESeq2
2
0
Entering edit mode
4 months ago
Guillermo ▴ 10

I was exploring my DE results from DESeq2 and noticed that some of my DEGs seemed to be particularly enriched in groups with a particular combination of factors. Therefore, I decided to carry out an interaction analysis using the following code (I include the gene Hspa1a as an example of a gene higher in samples with XY chromosomes and testes):

# Define formula including interaction term
formula <- as.formula("~chromosomes + gonads + chromosomes:gonads")

dds <- DESeqDataSetFromMatrix(countData = round(counts_filtered),
                              colData = metadata_filtered,
                              design = formula)

# PRemove genes with all samples <10 counts")
toRemove <- rowSums(counts(dds) < 10) == ncol(dds) # ncol(dds) equals N.samples
dds <- dds[!toRemove,]

### DE analysis

dds <- DESeq(dds)

# From metadata:
# chromosomes: Factor with 2 levels: "XX", "XY"
# gonads: Factor with 2 levels: "ov", "te"

# Adapted from ?results:

# the gonad effect for chromosomes XX (the main effect)
results(dds, name="gonads_te_vs_ov") %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")

# # the gonad effect for chromosomes XY
# this is, by definition, the main effect *plus* the interaction term
# (the extra condition effect in XY compared to genotype XX).
results(dds, list( c("gonads_te_vs_ov", "chromosomesXY.gonadste") )) %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")

# the interaction term, answering: is the gonad effect *different* across sex chromosome complement?
results(dds, name="chromosomesXY.gonadste") %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")

### Plotting
vst_normCounts <- assay(vst(dds, blind=TRUE)) %>% 
  as.data.frame() %>%
  rownames_to_column("symbol") %>% 
  dplyr::filter(symbol == "Hspa1a") %>%
  pivot_longer(!symbol, names_to = "id", values_to = "vst_exp")

# Get the useful metadata columns for this interaction and add to counts
vst_columns <- c("id", "chromosomes", "gonads")

metadata_vst <- metadata_filtered %>%
  rownames_to_column("id") %>% 
  dplyr::select(all_of(vst_columns))

vstPlot_df <- vst_normCounts %>% 
  dplyr::left_join(metadata_vst, by = "id")

ggplot(vstPlot_df, aes(x = chromosomes, y = vst_exp, color = gonads)) +
  geom_point(size = 3) +
  scale_colour_discrete(name = str_to_title(colnames(vstPlot_df)[5])) +
  ggtitle(vstPlot_df$symbol[1]) +
  xlab("Sex chromosome complement") +
  ylab("Normalised expression") +
  theme_light(base_size = 18) +
  theme(plot.title = element_text(hjust = 0.5))

The results commands output the following:

> results(dds, name="gonads_te_vs_ov") %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")
    gene baseMean log2FoldChange     lfcSE     stat    pvalue      padj
1 Hspa1a 21847.06      0.2173161 0.1878391 1.156927 0.2473022 0.9416609
> 
> # the gonad effect for chromosomes XY
> # this is, by definition, the main effect *plus* the interaction term
> # (the extra condition effect in XY compared to genotype XX).
> results(dds, list( c("gonads_te_vs_ov", "chromosomesXY.gonadste") )) %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")
    gene baseMean log2FoldChange     lfcSE     stat       pvalue        padj
1 Hspa1a 21847.06      0.8390135 0.2028182 4.136777 3.522188e-05 0.008823081

> 
> # the interaction term, answering: is the gonad effect *different* across sex chromosome complement?
> results(dds, name="chromosomesXY.gonadste") %>% as.data.frame() %>% rownames_to_column("gene") %>% dplyr::filter(gene == "Hspa1a")
    gene baseMean log2FoldChange     lfcSE     stat     pvalue      padj
1 Hspa1a 21847.06      0.6216974 0.2764394 2.248947 0.02451588 0.9996121

How is it possible that there is such a clear difference in values between gonads for XY samples compared to XX samples but the interaction term is nowhere close to significant? Am I not understanding properly how interactions work? Did I do something wrong in my code, or a larger effect size would be needed for significance? See the plot below:

VST-normalised gene expression by sex chromosome complement and gonads for Hspa1a

DEG DE interaction DESeq2 RNA-seq • 695 views
ADD COMMENT
2
Entering edit mode
4 months ago
ATpoint 89k

Noise in te of XX group is considerable (over one log2 order), meaning more than a 2-fold change within a group, so that is probably at least one of the reasons.

ADD COMMENT
1
Entering edit mode
4 months ago

Your lfcSE is too high which means your stat is low. From the comparison "gonads_te_vs_ov":

stat = log2FoldChange / lfcSE

1.15692685921 = 0.2173161 / 0.1878391

The probability for a stat value of around 1.16 is 0.8770 based on a normal distribution table.

With this information you calculate the p-value as such:

1 - 0.8770 = 0.123

0.123 * 2 = 0.246

This is consistent with the p-value you are quoted by DESeq2.

Basically, your standard error is too high because the data is noisy.

ADD COMMENT

Login before adding your answer.

Traffic: 3666 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6