DESeq2: Find differentially expressed genes specific to one condition
3
5
Entering edit mode
7.4 years ago
Christian ★ 3.0k

To all DESeq2 experts or biostatisticians out there:

Say I have an RNA-seq experiment with three conditions: A, B, and C. I want to know which genes are differentially expressed specifically in A; that is, different between A vs. B, A vs. C, but not B vs. C.

Solution 1) The naive approach of course is to perform pairwise comparisons and pull out genes specific to A via set overlaps of the three resulting gene lists. What I don't like about this is that I have to impose two arbitrary cut-offs: one to establish differential expression (say FDR < 0.01, |log2FoldChange| >= 1) and one to establish no differential expression (say FDR > 0.5, |log2FoldChange| < 1). Another disadvantage is that I am stuck with three different FDR and log2FoldChanges that I am not sure how to combine into a final gene ranking, for example for pre-ranked GSEA.

Solution 2) An alternative is to compare A with all non-A, but this would not guarantee me non-differential expression between B and C. Furthermore, I guess I sacrifice power by withholding the information that 'non-A' is actually composed of two conditions (B+C).

After implementing both solutions they still feel like a hack. Can this problem be solved in one nice integrated statistical framework that also allows me to nicely rank my resulting DEGs by statistical significance?

I read about contrasts and likelihood ratio tests in the DESeq2 vignette, but could not come up with an answer so far.

deseq2 RNA-Seq • 5.8k views
0
Entering edit mode

I suspect that solution 1 is the only viable one. Things like a single fold-change aren't defined in the comparison you're trying to make.

2
Entering edit mode
5.5 years ago
TriS ★ 4.6k

I had a similar situation, I used limma, not DESeq2 but the concept is the same. the first thing I did was to use your approach #1, which worked fine. then I started working on the contrast matrix.

let's say your data look like

> data
A          A           B           B          C           C
gene_1   0.73119617  1.0334765 -0.98585078 -1.70966079 -1.2937223 -0.03300298
gene_2  -0.49758823  1.2640492  1.07019633 -0.52513986  0.7001930 -2.06107495
gene_3   0.71012607  1.1087370  0.14327705 -1.34193321  2.8695163  0.33020928
gene_4   0.87754370  0.2782034  1.06616370  0.98361920 -0.4598298 -0.37822077
gene_5   0.09154438 -0.9534070  0.52547509  3.08489922  0.9944367  2.11002666
gene_6   0.38530556  1.3154407 -0.54681818 -1.38463685 -0.3651117  0.33112452
gene_7   1.21923553 -0.6029883  0.57609204 -2.43241917 -0.4122289 -0.34803657
gene_8   0.57848398  0.5944403  0.49081718  0.35163585  1.1472741 -1.36472408
gene_9   0.06271539  0.5102718  0.06256104 -1.81262365 -0.1244413  0.01500785
gene_10  1.52295243  0.7743652 -0.09160387 -0.08225646 -0.2866435  1.49100980


now build your design + contrast matrix

d <- model.matrix(~0+colnames(data))
colnames(d) <- c("A","B","C")
contr.matrix <- makeContrasts(
AvsB = B-A,
AvsC = C-A,
BvsC = C-B,
AvsAll = A-(B+C)/2,
levels = colnames(d)
)


then fit the model

fit <- lmFit(data,d)
fit <- contrasts.fit(fit,contr.matrix)
fit <- eBayes(fit)
summary(decideTests(fit))


and this should be pretty close to what you look for

1
Entering edit mode
5.5 years ago
Martombo ★ 3.0k

This seems to be quite a common problem. The method I chose to use is to compare AvsB and AvsC p-values through a p-value combination method, since I was not interested in specifying that the results would not be significant in BvsC. Otherwise you can try to use the following contrast: A - (B+C)/2. This again only compares A to the average of B and C, but at least the other DEA steps (like dispersion estimation) will treat B and C separately.

The only other thing I can think of is to compare B and C for no differential expression and to combine these p-values with the one you obtain from A vs avg(B,C).

0
Entering edit mode

How about an (A-(B+C)/2) - (B-C) model?

0
Entering edit mode

I think that would return a significant result if A=20, B=30, C=10? which is not what you want

0
Entering edit mode

(A - (B+C)/2) - (B-C) = (20 - (30+10)/2) - (30-10) = (20-20) -20 = -20

So yes, it would be significant, BUT what that would be telling you is that the gene is significantly less up in condition than other conditions, where as a positive value would be telling up it was more up.

0
Entering edit mode
5.5 years ago

Hi Christian,

I have the same issue. Did you resolve it?

Thanks,

Christian

0
Entering edit mode

you can try this:

#dd is your DESeq matrix
vsd <- varianceStabilizingTransformation(dd, blind=TRUE)
vst <- assay(vsd)
colnames(vst) <- colnames(count.table)
vst <- vst - min(vst)
dds <- DESeq(dd)

res.AvsB <- results(dds, contrast=c("treatment","A", "B"))
res.AvsC <- results(dds, contrast=c("treatment","A", "C"))

#select  expressed genes

mat.A.B <- assay(vsd)[head(order(res.AvsB$padj)), ] pheatmap(mat.A.B) mat.A.C <- assay(vsd)[head(order(res.AvsCt$padj)), ]
pheatmap(mat.A.C)

0
Entering edit mode

Unfortunately not. I went ahead with the set-overlap solution, although I am still not fully satisfied with it from a statistical point of view.

EDIT: should have been a comment to the previous answer, not a reply to this comment.

0
Entering edit mode

How about trying to building your linear model to do a difference of differences test? You want to ask if the difference between the A/notA samples is bigger than the B-C difference. I suspect Gordon Smyth is the person to ask hope to do this, although you might need to be happy using limma voom/edgeR rather than DESeq2.