Question

Extreme shrinkage in DESeq2

2

Entering edit mode

16 months ago

DCZ ▴ 10

Hello,

I'm processing a bulk RNa-seq dataset which was aligned using STAR, counted using Salmon. The quant.sf files were then processed by tximport into a txi object, with countsFromAbundance = "no".

A dds was created using DESeqDataSetFromTximport, then the normal DESeq2 analysis was performed. I noticed, however, an enormous amount of Differentially expressed genes:

out of 56570 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 4552, 8%
LFC < 0 (down)     : 29409, 52%
outliers [1]       : 0, 0%
low counts [2]     : 3305, 5.8%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Also, when looking at the MA plots, the shrinkage using apeglm seems to shrink every non-significantly differentially expressed gene to 0.

plotMA(res, alpha = FDR, main = paste0("Unshrunken MA-Plot (FDR = ", FDR, ")"), ylim = c(-5,5) )

unshrunken MA plot

plotMA(resLFC, alpha=FDR, main = paste0("MA-plot (FDR = ", FDR, ")"), ylim = c(-5,5) )

shrunken MA plot

I noticed many of the genes have a normalized count < 1.

sum(rowMeans(counts(full_dds, normalized = T))<1)
--> 11,474

Here's the corresponding pca plot (red: ConditionA, Green Condition B)

Here are the raw counts per condtion Raw_counts

Could these characteristics be due to the pseudo-aligner counts & the use of tximport? Would anyone be able to help me with the interpretation of this result?

Thanks in advance

salmon shrinkage tximport DESeq2 • 1.4k views

ADD COMMENT • link updated 16 months ago by ATpoint 82k • written 16 months ago by DCZ ▴ 10

score 1 · Answer 1 · 2022-12-23

1

Entering edit mode

16 months ago

ATpoint 82k

Looks pretty violent. What are you comparing here in terms of experimental setup? Are extreme changes expected like cancer vs normal or contrasting different tissues? Also please add a PCA and the depth per sample (colSums of raw counts) to see whether one group systematically is undersequenced and you have lots of low counts nested by group. Also, gry a new analysis with loe counts filtered, soo the DESeq2 vignette or use filterByExpr from edgeR.

ADD COMMENT • link 16 months ago by ATpoint 82k

0

Entering edit mode

Thanks for the speedy response. It's a comparison of the same type of cancer (tissue) in human with different inflammation characteristics. Differences are to be expected, but not as violent as here. There's no matching between patients possible. I added the PCA plot & a violinplot with raw reads in the original post. Filtering applied is rowSums(counts(dds)) > 1. I was under the impression that low counts would not contribute a lot to the analysis. I'll rerun the analysis with dds[rowSums(counts(dds, normalized = TRUE) >=5) >=3], which already filters out 9,424 genes.

ADD REPLY • link 16 months ago by DCZ ▴ 10

1

Entering edit mode

It seems A has systematically lower depth. You can color the PCA by depth and see whether that separation in early PCs is due to depth. If so you would probably need to resequence A or do some sort of subsampling normalization to downscale B. Is this A vs B in the results? If so then most downregulated genes are probably due to lower depth/dropouts in A rather than real expression differences.

ADD REPLY • link 16 months ago by ATpoint 82k

0

Entering edit mode

This seems to be the case! Would you have an idea why differences in depth are not accounted for by sizeFactors (in this case normalizationFactors?) enter image description here

ADD REPLY • link 16 months ago by DCZ ▴ 10

1

Entering edit mode

Check the counts. If it is zero in A then any size factor won’t change that. It is probably close to zero or zero in some or all samples of A. That would be dropouts. 0*anything is always zero so the normalization by scaling has no effect. You can try to subsample the deeper sample (downsampleMatrix from the scuttle package can do that) or resequence to greater depth.

ADD REPLY • link 16 months ago by ATpoint 82k

0

Entering edit mode

Thank you for the suggestion. However, there are no zero counts in the data. rowMeans(counts(full_dds, normalized = T)) results in 11,474 values between 0 & 1. I suppose this is due to the tximport with countsFromAbundance = "no". Moreover, Michael Love suggests here not to downsample, since the sizeFactors should handle the difference in library sizes. Does this not apply to counts generated by pseudoaligners like Salmon?

ADD REPLY • link 16 months ago by DCZ ▴ 10

0

Entering edit mode

Your plots show that size factors do not take care of it though, depth is the biggest confounder. Try subsampling, and see whether this improves PCA, that is my suggestion. Others may have different strategies.

ADD REPLY • link 16 months ago by ATpoint 82k