Question: DESeq2: Find differentially expressed genes specific to one condition
5
gravatar for Christian
4.0 years ago by
Christian2.8k
Cambridge, US
Christian2.8k wrote:

To all DESeq2 experts or biostatisticians out there:

Say I have an RNA-seq experiment with three conditions: A, B, and C. I want to know which genes are differentially expressed specifically in A; that is, different between A vs. B, A vs. C, but not B vs. C.

Solution 1) The naive approach of course is to perform pairwise comparisons and pull out genes specific to A via set overlaps of the three resulting gene lists. What I don't like about this is that I have to impose two arbitrary cut-offs: one to establish differential expression (say FDR < 0.01, |log2FoldChange| >= 1) and one to establish no differential expression (say FDR > 0.5, |log2FoldChange| < 1). Another disadvantage is that I am stuck with three different FDR and log2FoldChanges that I am not sure how to combine into a final gene ranking, for example for pre-ranked GSEA.

Solution 2) An alternative is to compare A with all non-A, but this would not guarantee me non-differential expression between B and C. Furthermore, I guess I sacrifice power by withholding the information that 'non-A' is actually composed of two conditions (B+C).

After implementing both solutions they still feel like a hack. Can this problem be solved in one nice integrated statistical framework that also allows me to nicely rank my resulting DEGs by statistical significance?

I read about contrasts and likelihood ratio tests in the DESeq2 vignette, but could not come up with an answer so far.

 

rna-seq deseq2 • 3.6k views
ADD COMMENTlink modified 2.1 years ago by Martombo2.4k • written 4.0 years ago by Christian2.8k

I suspect that solution 1 is the only viable one. Things like a single fold-change aren't defined in the comparison you're trying to make.

ADD REPLYlink written 4.0 years ago by Devon Ryan90k
1
gravatar for TriS
2.1 years ago by
TriS3.8k
United States, Buffalo
TriS3.8k wrote:

I had a similar situation, I used limma, not DESeq2 but the concept is the same. the first thing I did was to use your approach #1, which worked fine. then I started working on the contrast matrix.

let's say your data look like

> data
                  A          A           B           B          C           C
gene_1   0.73119617  1.0334765 -0.98585078 -1.70966079 -1.2937223 -0.03300298
gene_2  -0.49758823  1.2640492  1.07019633 -0.52513986  0.7001930 -2.06107495
gene_3   0.71012607  1.1087370  0.14327705 -1.34193321  2.8695163  0.33020928
gene_4   0.87754370  0.2782034  1.06616370  0.98361920 -0.4598298 -0.37822077
gene_5   0.09154438 -0.9534070  0.52547509  3.08489922  0.9944367  2.11002666
gene_6   0.38530556  1.3154407 -0.54681818 -1.38463685 -0.3651117  0.33112452
gene_7   1.21923553 -0.6029883  0.57609204 -2.43241917 -0.4122289 -0.34803657
gene_8   0.57848398  0.5944403  0.49081718  0.35163585  1.1472741 -1.36472408
gene_9   0.06271539  0.5102718  0.06256104 -1.81262365 -0.1244413  0.01500785
gene_10  1.52295243  0.7743652 -0.09160387 -0.08225646 -0.2866435  1.49100980

now build your design + contrast matrix

d <- model.matrix(~0+colnames(data))
colnames(d) <- c("A","B","C")
contr.matrix <- makeContrasts(
  AvsB = B-A, 
  AvsC = C-A, 
  BvsC = C-B,
  AvsAll = A-(B+C)/2,
  levels = colnames(d)
)

then fit the model

fit <- lmFit(data,d)
fit <- contrasts.fit(fit,contr.matrix)
fit <- eBayes(fit)
summary(decideTests(fit))

and this should be pretty close to what you look for

ADD COMMENTlink written 2.1 years ago by TriS3.8k
1
gravatar for Martombo
2.1 years ago by
Martombo2.4k
Seville, ES
Martombo2.4k wrote:

This seems to be quite a common problem. The method I chose to use is to compare AvsB and AvsC p-values through a p-value combination method, since I was not interested in specifying that the results would not be significant in BvsC. Otherwise you can try to use the following contrast: A - (B+C)/2. This again only compares A to the average of B and C, but at least the other DEA steps (like dispersion estimation) will treat B and C separately.

The only other thing I can think of is to compare B and C for no differential expression and to combine these p-values with the one you obtain from A vs avg(B,C).

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Martombo2.4k

How about an (A-(B+C)/2) - (B-C) model?

ADD REPLYlink written 2.1 years ago by i.sudbery4.8k

I think that would return a significant result if A=20, B=30, C=10? which is not what you want

ADD REPLYlink written 2.1 years ago by Martombo2.4k

(A - (B+C)/2) - (B-C) = (20 - (30+10)/2) - (30-10) = (20-20) -20 = -20

So yes, it would be significant, BUT what that would be telling you is that the gene is significantly less up in condition than other conditions, where as a positive value would be telling up it was more up.

ADD REPLYlink written 2.1 years ago by i.sudbery4.8k
0
gravatar for christian.cortes.campos
2.1 years ago by

Hi Christian,

I have the same issue. Did you resolve it?

Thanks,

Christian

ADD COMMENTlink written 2.1 years ago by christian.cortes.campos0

you can try this:

#dd is your DESeq matrix
 vsd <- varianceStabilizingTransformation(dd, blind=TRUE)
vst <- assay(vsd)
colnames(vst) <- colnames(count.table)
vst <- vst - min(vst)
dds <- DESeq(dd)

res.AvsB <- results(dds, contrast=c("treatment","A", "B"))
res.AvsC <- results(dds, contrast=c("treatment","A", "C"))

#select  expressed genes 

mat.A.B <- assay(vsd)[head(order(res.AvsB$padj)), ]
pheatmap(mat.A.B) 

mat.A.C <- assay(vsd)[head(order(res.AvsCt$padj)), ]
pheatmap(mat.A.C)
ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Lila M 470

Unfortunately not. I went ahead with the set-overlap solution, although I am still not fully satisfied with it from a statistical point of view.

EDIT: should have been a comment to the previous answer, not a reply to this comment.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Christian2.8k

How about trying to building your linear model to do a difference of differences test? You want to ask if the difference between the A/notA samples is bigger than the B-C difference. I suspect Gordon Smyth is the person to ask hope to do this, although you might need to be happy using limma voom/edgeR rather than DESeq2.

ADD REPLYlink written 2.1 years ago by i.sudbery4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1630 users visited in the last hour