Question

Choosing the correct shrinkage type

1

Entering edit mode

13 months ago

Chiara ▴ 10

Hi everyone,

First time poster here - tried to look for the answer but I do not seem to find exactly what I am looking for.

I have a question re log2FC shrinking as part of my DEse2 pipeline. I cannot understand which type of shrinkage I should be using.

Data is from totalRNA extracted from a tissue + vs - KO of a gene of interest. Aligned to genome with STAR, deduplicated, and then gene counts extracted with featureCounts. The comparison is a simple 4 WT replicates vs 4 KO replicates. No complex experimental design.

I know in advance that the knock out affects a lot of genes which are highly expressed. We normally would shrink the log2FC using apeglm but this time, if I observe the MA plot after apeglm, the results look very weird.

These are the unshrunken, the apeglm shrunken, and the normal shrunken MA plots Normal apeglm

One can see here, although not easily, that there are some significant blue dots almost on the 0 axis, in the apeglm shrunken data. Something I could also observe in my volcano plots.

normal

From what I understand the effect of shrinking should be, I should be trusting in this case the normal one, more than the apeglm? Considering that despite the apeglm shrinking, I still get some log2FC for lowly expressed genes to be relatively high (ie +3 log2FC)

Additionally, although it is sometimes only a moderate effect, the genes significantly regulated have bigger log2FC (in absolute value) when I do apeglm, even those more highly expressed.

Is there any way to confidently say which method is correct? Is the shrinkage affected by the fact that I do not have a "balanced" amount of genes changing in positive vs negative?

Thanks very much for your help.

Chiara

Here below the code I used for the plots after running the DESeq for making the plots

lfc_shrink <- lfcShrink(dds, coef=paste("condition", treatment, "vs", control, sep = "_"), type="apeglm")
lfc_shrink_n <- lfcShrink(dds, coef=paste("condition", treatment, "vs", control, sep = "_"), type="normal")

FDR = 0.1
DESeq2::plotMA(res, alpha = FDR, main = paste0("Unshrunken MA-Plot (FDR = ", FDR, ")"), ylim = c(-5,5))
DESeq2::plotMA(lfc_shrink, alpha = FDR, main = paste0("apeglm MA-Plot (FDR = ", FDR, ")"), ylim = c(-5,5))
DESeq2::plotMA(lfc_shrink_n, alpha = FDR, main = paste0("normal shr MA-Plot (FDR = ", FDR, ")"), ylim = c(-5,5))

DEseq2 apeglm lfcShrink • 574 views

ADD COMMENT • link updated 13 months ago by LChart 3.9k • written 13 months ago by Chiara ▴ 10

score 1 · Answer 1 · 2023-03-15

Whatever method you are using, you are replacing maximum likelihood estimates with maximum a-posteriori estimates. The question of which prior is "right" is really one of which is closer to what you expect to be happening.

For instance, if comparing two conditions, you might expect there to be a combination of many small changes, some moderate changes, and a few large changes; in which case a normal prior might be appropriate. But if comparing a gene knockout vs wild-type in a single cell line, you might instead expect most genes to remain unchanged, while a few genes alter their expression, in which case a Laplace (or whatever apeGLM uses) might be more fitting.

If you have a good external biological replicate (GEO study for the same or highly similar contrast, for instance), you can see which prior, when applied to both your dataset and the external dataset, shows the highest concordance (in terms of correlation of logFCs). This could be different for each contrast so this won't give you a universal answer.