Entering edit mode
5 weeks ago
bthom
▴
10
Can someone help me to understand why the following code returns different results for df (66 DEGs) and df_1 (113 DEGs). As far as I understand, the two code chunks should be interchangeable (aside from a difference in the signs of the log2FC).
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = sample_matrix,
design = ~MFG)
dds$MFG <- relevel(dds$MFG , ref = "Good_Unknown")
dds <- DESeq(dds)
res <- results(dds, contrast = c("MFG", "Good_Unknown", "Fail"))
resLFC<- lfcShrink(dds, coef = "MFG_Fail_vs_Good_Unknown", type="apeglm")
df <- as.data.frame(resLFC)
df |> filter(log2FoldChange > 0.58 | log2FoldChange < -0.58) |> filter(padj < 0.1)
and
dds_1 <- DESeqDataSetFromMatrix(countData = cts,
colData = sample_matrix,
design = ~MFG)
dds_1$MFG <- factor(dds_1$MFG, levels = c("Fail","Good_Unknown"))
dds_1 <- DESeq(dds_1)
res_1 <- results(dds_1)
resultsNames(dds_1)
resLFC_1 <- lfcShrink(dds_1, coef="MFG_Good_Unknown_vs_Fail", type="apeglm")
df_1 <- as.data.frame(resLFC_1)
df_1 |> filter(log2FoldChange > 0.58 | log2FoldChange < -0.58) |> filter(padj < 0.1)
It has been reported before that the choice of reference level can lead to some differences in how apeglm estimates the shrunken fold changes. And since you filter on these that ight be the cause. Can you check whether padj are the same between both and fold changes are a bit different? The solution anyway is to decide for one reference level and stick with it. Unfortunately, these sorts of statistical procedures have the downside that they might show some unexpected behaviour in some corner cases, so decide for one and go on.
Hm, thanks for the insight. It seems like both the fold changes and the padj are slightly different. Can you explain how the choice of the reference levels differs between my two code chunks and how this can alter apeglm's behavior? Shouldn't they be the same?
I cannot explain why it alters the behaviour, that is beyond my understanding of the apeglm internals, but guess who asked this exact same question and got an answer here :-D https://support.bioconductor.org/p/p134551/
Thanks for sharing this! Would it be bad practice to reply to Mike's comment on bioconductor? I see that he had a possible explanation for your case (>6 levels), but here I only have 2 levels.
I assume the difference in whether factor was used isn't doing anything because the elements of MFG are strings, and not numbers?
I am not sure that I follow, factors in R are stored as integers with labels assigned to them. Furthermore, the code example in the DESeq vignette uses strings: "condition","treated","untreated"