Answered by Michael Love on bioconductor - Bioconductor link
I am using DESeq2 v1.22.2 to test for changes in tumors over time pre and post treatment. However when I use the results function, I get log2FC on the order of +/- 30, which is of course unreasonably large. If I follow up with lfcshrink + ashr shrinkage, these barely change and if I use lfcshrink + normal shrinkage, these effects are shrunken to nearly 0. I understand "ashr" generally performs better than "normal" and that it preserves large LFC, but these LFC are unrealistically large with ashr. I have 2 questions:
- Is this expected behavior or does log2FC on this scale imply there is something fundamentally wrong with my design or samples?
- If this is expected behavior, should I go with "ashr" or "normal"? I am not using apeglm because I need the contrast argument.
Not all DEGs have such massive LFC - it seems to almost be bimodal with significant genes either having L2FC > 25 or < 5.
There is significant variability in tumor type (breast, colorectal, melanoma etc), but I am hoping this effect is captured by a paired design. Samples generally cluster on PCA based on their patientID, not time. I have at minimum 2 replicates per timepoint, but usually 3 or more. There is also significant variability in RNA quality, which I have binned into "good" or "bad". My design is thus:
I then test the effect of time using the contrast argument. I have noticed that the extent of these massive LFC genes is slightly affected by the threshold I use for RIN, but nothing seems to eliminate the effect entirely. I have tried removing rbin, modeling RIN as a continuous variable, removing all tumors with low RIN, SVA (with 1, 2, and 3 SVs), filtering samples based on tumor type, and removing the PatientID variable with no luck.
I apologize if this question has been asked before and am happy to provide more info as needed. Thanks!
Edit - more info
PatientID disease time rbin PatientA A d1 low PatientA A d100 high PatientA A d200 low PatientB A d1 high PatientB A d100 low PatientC A d1 high PatientC A d100 low PatientC A d200 low PatientD B d1 high PatientD B d100 high PatientD B d200 low PatientE A d1 low PatientE A d100 high PatientE A d200 low PatientF C d1 low PatientF C d200 low PatientG D d1 high PatientG D d100 low PatientG D d200 high
# Read in metadata meta <- read.csv("meta.csv") # Import txi <- tximport(quantfiles, type = "salmon", tx2gene = tx2gene, importer = read_tsv, ignoreTxVersion=TRUE, countsFromAbundance="lengthScaledTPM") # Create DESeqDataSet dds <- DESeqDataSetFromTximport(txi = txi, colData = meta, design=~PatientID+rbin+time) # Run DESeq dds <- DESeq(dds) # DGE res <- list() res$d100vd1_noShrink<-results(dds,contrast=c("time","d100","d1")) res$d100vd1_normalShrink<-lfcShrink(dds,contrast=c("time","d100","d1"),type="normal") res$d100vd1_ashrShrink<-lfcShrink(dds,contrast=c("time","d100","d1"),type="ashr")
PCA (color = disease, label = patient):
MA plots with various shrinkages: