Strange RNA-Seq Results When Adding Covariates
Entering edit mode
5 months ago

Hi all,

I am performing an RNA-sequencing experiment using primary human fibroblasts. The study is a simple case/control study with a treated and untreated condition.

I have begun my analyses by comparing case/control while disregarding treatment. When I don't include covariates I get 90 significant genes out of a count matrix of ~26k. These seem to make biological sense.

However, when I include age as a covariate, I suddenly get almost 9k significant genes. I have also generated histograms of the age distribution in the cases vs controls and they seem to be relatively similar.

It is also worth noting that HOX genes are a confounder in my experiment. However, I am unsure why they cropped up as differentially expressed when accounting for age but when I didn't include age in the model.

Would you have any ideas as to why this has happened?

Code below written in rstudio.

Analysis Without Covariate

dds <- DESeqDataSetFromMatrix(counts, meta, design = ~condition)
dds <- DESeq(dds)

contrast_condition <- c("condition", "case", "control")

#Extract results
res_table <- results(dds, contrast = contrast_condition, alpha = 0.05)

#Apply LFC shrinkage 
res_table <- lfcShrink(dds, coef = "condition_case_vs_control", type = "apeglm")

res_table <- res_table %>%
  data.frame() %>%
  rownames_to_column(var = "gene") %>%

#Extract significant results
sig_res <- res_table %>%
  dplyr::filter(padj < 0.05)

Top Sig Genes No Covariate

no covariate

Analysis with Age as A Covariate

#Scale and centre the age variable as numeric variables with high mean can impair the model fit
meta$age_months_scaled <- scale(meta$age_months, center = TRUE)

#Remove NAs from age$months as DESeq can't handle NAs
meta_na.rm <- meta[!$age_months), ]

counts_na.rm <- counts[ , meta_na.rm$sample_name]

#Run the analysis on DESeq dataset
dds_age <- DESeqDataSetFromMatrix(counts_na.rm, meta_na.rm, design = ~age_months_scaled + condition)

dds_age <- DESeq(dds_age)

#Define contrast
contrast_condition <- c("condition", "case", "control")

#Extract results
res_table_age <- results(dds_age, contrast = contrast_condition, alpha = 0.05)

#Apply LFC shrinkage 
res_table_age <- lfcShrink(dds_age, coef = "condition_case_vs_control", type = "apeglm")
#No LFC shrinkage errors

res_table_age <- res_table_age %>%
  data.frame() %>%
  rownames_to_column(var = "gene") %>%

#Extract significant results
sig_res_age <- res_table_age %>%
  dplyr::filter(padj < 0.05)

Top Sig Genes With Covariate

enter image description here

covariates RNA-Seq DESeq2 • 378 views
Entering edit mode

It would probably be worth doing a bit of data exploration with e.g. PCA to better understand the factors contributing to the variance in your data. This becomes more important as you start adding covariates to a model.


Login before adding your answer.

Traffic: 2038 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6