Hi all,
I have been performing DESeq2 and have encountered an issue I have not seen before. My log 2 fold changes appear to be reversed (i.e. when something is downregulated the fold change reflects an upregulation). I must have made some basic error and set my reference group incorrectly. However, I can't seem to figure it out.
Please see below. I am examining a disease status variable (LS) with 3 levels (LS, Non_LS, Control). In this analysis, I am specifically comparing LS vs Control. Treatment and sex are covariates.
In previous analyses I have always had the reference group placed last in the contrast (i.e. case vs control) and it has worked fine. Any idea what went wrong?
meta$LS <- as.factor(meta$LS)
meta$LS <- relevel(meta$LS, ref = "Control")
dds_LS <- DESeqDataSetFromMatrix(counts, meta, design = ~ LS + treatment + Sex)
dds_LS <- DESeq(dds_LS)
contrast <- c("LS", "LS", "Control")
res_table_LS <- results(dds_LS, contrast = contrast, alpha = 0.05)
resultsNames(dds_LS)
[1] "Intercept"                      "LS_LS_vs_Control"              
[3] "LS_Non_LS_vs_Control"           "treatment_galactose_vs_glucose"
[5] "Sex_M_vs_F
res_table_LS <- lfcShrink(dds_LS, coef = "LS_LS_vs_Control", 
                                type = "apeglm")
How do you know that it is reverse? That design looks sufficiently complex that it would be difficult to tell from the raw counts alone.
Please add an illustrative example that shows why you think it is inversed. This basically has been asked many times before, e.g. over at support.bioconductor.org, and then compressed answer is that if you correct for (many) covariates other than the main covariate of interest, then, depending on the magnitude of its (=the other covariates) impact, you might get logFC estimates that do not correlate with plotting the univariate (=uncorrected) counts.
Thanks both,
This is very informative. I suppose I am mainly surprised by the magnitude of the difference between what the logFC and mean counts are telling me (although admittedly, the mean counts per group is a very crude measure).
Two examples are shown below:
SPTL1 LS mean: 297 Control mean: 1536 LFC: 1.6
TXNIP: LS mean: 8798 Control mean: 12000 LFC: 1.47
However, from what you have said, this is not something to worry about due to the complexity of the design?
Well, you certainly shouldn't rely just on the condition means. You could be easily falling victim to simpsons paradox. Looking at the indevidual sample counts might give you a better idea if something is off or not.