Question

DESeq2 design for two variables and pairwise comparisons

0

Entering edit mode

3.0 years ago

bioinfo8 ▴ 230

Hi,

I have rnaseq (some disease) samples from 7 families and my sample info is described below (affected=A, normal=N):

sample,family,condition,sex
AB1,1,A,F
AB2,1,N,F
AB3,2,A,M 
AB4,2,N,F
AB5,3,A,M
AB6,3,N,M
AB7,3,N,F
AB8,4,A,M
AB9,4,N,F
AB10,5,A,F
AB11,5,N,M
AB12,5,N,F
AB13,6,A,M
AB14,6,N,M
AB15,6,N,F
AB16,7,A,M
AB17,7,N,F

I am interested in the following comparisons:

1) condition_A_vs_N

2) All pairwise comparisons between affected samples from different families: A1_vs_A2, A1_vs_A3, .... (16 comparisons/DEGs lists).

3) Comparisons between affected samples from different families vs overall normal: A1_vs_N, A2,_vs_N, ...(7 DEGs lists)

I have tried:

1)

  dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ family + condition )  
  dds$group <- factor(paste0(dds$condition,dds$family)) 
  design(dds) <- ~ group 
  dds <- DESeq(dds)
  resultsNames(dds) 
             [1] "Intercept"      "group_N2_vs_N1" "group_N3_vs_N1" "group_N4_vs_N1" "group_N5_vs_N1" "group_N6_vs_N1" "group_A1_vs_N1" "group_A2_vs_N1"
             [9] "group_A3_vs_N1" "group_A4_vs_N1" "group_A5_vs_N1" "group_A6_vs_N1"

If I use contrast option, I can get 2. (one of the comparisons as an example), but not 1 and 3:

   results(dds, contrast=c("group", "A1", "A2")) 

    log2 fold change (MLE): group A1 vs A2 
    Wald test p-value: group A1 vs A2 
....................

2)

dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ family + condition + family:condition)    
dds <- DESeq(dds)
resultsNames(dds) 
 [1] "Intercept"     "family_2_vs_1" "family_3_vs_1" "family_4_vs_1" "family_5_vs_1" "family_6_vs_1" "condition_A_vs_N"  
 [8] "family2.conditionA" "family3.conditionA" "family4.conditionA" "family5.conditionA" "family6.conditionA"

Here, I get 1 (condition_A_vs_N), but not 2 and 3.

Do I need to use different design models to get the required comparisons? Please guide how should I proceed so that I can get 1, 2 and 3.

Also, as per manual, "The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables." Here I have more than 3 levels, should I go for LRT?

Thanks.

deseq2 contrasts rnaseq LRT • 3.6k views

ADD COMMENT • link updated 2.1 years ago by 1769mkc ★ 1.2k • written 3.0 years ago by bioinfo8 ▴ 230

1

Entering edit mode

Your comparison 2 is essentially a series of of 1 replicate vs one replicate comparisons, and I would almost always recommend against it. Replicates are necessary because it is necessary to estimate the dispersion for gene expression estimates. In theory it might be possible to estimate dispersion by assuming no differential expression, and effectively use all 16 samples simultaneously to estimate dispersion, but I don't know if DESeq2 can do this, I know that edgeR used to be able to do this, but this seems sub-optimal.

The situation is a little better in the case of your comparison three, because one of the groups has replicates, while the other doesn't, and it should be possible to estimate dispersion from the family corrected normals and assume that this disperision is the same for the affecteds, but its still not ideal.

What is the research questions you are trying to answer? Why do you need to know about the families independently?

ADD REPLY • link 3.0 years ago by i.sudbery 19k

score 5 · Accepted Answer · 2021-05-03

5

Entering edit mode

3.0 years ago

Carlo Yague 8.7k

It is not always easy to get all contrast of interest with a single design. Here I show you how to get contrasts [1] and [2] with one design, and contrast [3] with another. Smarter people might know how to get all three with a single design.

First I create some data similar to yours:

dds <- makeExampleDESeqDataSet(m=21)
dds$family=as.factor(rep(c(1:7), each=3))
dds$condition=factor(rep(c("A","N","N"), 7), levels=c("N","A"))
dds$sex=as.factor(sample(c("F","M"),21, replace=T))

Then I formulate the first design, which allows to write contrasts [1] and [2] rather easily (no need for a grouping variable here).

design(dds) <- ~ family*condition
dds <- DESeq(dds)

# condition A vs N : 2 possibilities: either
results(dds,c("condition", "A","N")) # or
results(dds,name="condition_A_vs_N")

# pairwise comparison between families
results(dds,c("family","2", "1"))
results(dds,c("family","3", "1")) # etc...

Contrasts in [3] are more tricky... Here is one solution using a grouping factor :

# every A vs overal N --> need a grouping factor
groups=paste0(rep(c("A","N","N"), 7),rep(c(1:7), each=3))
groups[grep("N", groups)]="N"
groups=as.factor(groups)
dds$group=relevel(groups, "N")

# for the design, we use the group, but we can also control for family effects.
design(dds) <- ~ group + family
dds <- DESeq(dds)

# then use the group to specify contrasts
results(dds,c("group", "A1","N"))
results(dds,c("group", "A2","N")) # etc...

Here, I don't think that you really want LRT since you are interested in pairwise comparisons (not all at once).

ADD COMMENT • link 3.0 years ago by Carlo Yague 8.7k

0

Entering edit mode

@Carlo Yague Thanks for the detailed explanation and guidance.

ADD REPLY • link 3.0 years ago by bioinfo8 ▴ 230

0

Entering edit mode

@Carlo Yague As you mentioned, contrast 2 as:

# pairwise comparison between families
results(dds,c("family","2", "1"))

I did not mean overall families, but between affected samples from different families: A1_vs_A2, A1_vs_A3, ...

So, I tried to use group for that using another design (design 2):

dds$group <- factor(paste0(dds$condition,dds$family)) 
design(dds) <- ~ group 
dds <- DESeq
resultsNames(dds)

[1] "Intercept"      "group_N2_vs_N1" "group_N3_vs_N1" "group_N4_vs_N1" "group_N5_vs_N1" "group_N6_vs_N1" "group_A1_vs_N1" "group_A2_vs_N1"
[9] "group_A3_vs_N1" "group_A4_vs_N1" "group_A5_vs_N1" "group_A6_vs_N1"

results(dds, name = "group_A1_vs_A2")              
Error: subscript contains invalid names

But, contrast option gives me output:

results(dds,c("group","A1", "A2"))    

log2 fold change (MLE): group A1 vs A2 
Wald test p-value: group A1 vs A2 
DataFrame with 22200 rows and 6 columns

I don't understand this, the contrast option gave me output but the name option did not.

For 3) Comparisons between affected samples from different families vs overall normal: A1_vs_N, A2,_vs_N, ..

Another design (suggested in the previous reply), not helpful:

dds$group <- factor(paste0(dds$condition,dds$family)) 
design(dds) <- ~ group + family
dds <- DESeq(dds)
resultsNames(dds) 

[1] "Intercept"     "family_2_vs_1" "family_3_vs_1" "family_4_vs_1" "family_5_vs_1" "family_6_vs_1" "condition_A_vs_N"   "family2.conditionA" "family3.conditionA"
[10] "family4.conditionA" "family5.conditionA" "family6.conditionA"

So, will need three designs for three contrasts?

Please guide me in what I am doing wrong.

Thanks.

ADD REPLY • link 3.0 years ago by bioinfo8 ▴ 230

2

Entering edit mode

I did not mean overall families, but between affected samples from different families: A1_vs_A2, A1_vs_A3, ...

Ok, I misread this.

I don't understand this, the contrast option gave me output but the name option did not.

It is because "group_A1_vs_A2" is not in resultsName(dds).

Another design (suggested in the previous reply), not helpful:

What do you mean it is not helpful ? the code I suggested in my answer above show how to aggregate Nx condition to a unique N level, which allow to make the comparison you want. Give it a try. It should output that kind of resultsNames:

resultsNames(dds)
 [1] "Intercept"     "group_A1_vs_N" "group_A2_vs_N" "group_A3_vs_N" "group_A4_vs_N"
 [6] "group_A5_vs_N" "group_A6_vs_N" "group_A7_vs_N" "family_2_vs_1" "family_3_vs_1"
[11] "family_4_vs_1" "family_5_vs_1" "family_6_vs_1" "family_7_vs_1"

ADD REPLY • link 3.0 years ago by Carlo Yague 8.7k

0

Entering edit mode

How would you run LRT on this single condition ?

ADD REPLY • link 2.1 years ago by 1769mkc ★ 1.2k

score 2 · Accepted Answer · 2021-05-04

2

Entering edit mode

3.0 years ago

swbarnes2 14k

I don't think it's meaningful to compare a single sample to anything. You would need more samples from each family to be able to say anything about family specific differences. You can compare A to N, and that's about it.

ADD COMMENT • link 3.0 years ago by swbarnes2 14k

2

Entering edit mode

I suppose it can still be meaningful (or at least useful) in exploratory analysis, but yes, the dataset is underpowered to study interaction between condition and family.

ADD REPLY • link 3.0 years ago by Carlo Yague 8.7k