DESeq2 - how to build design for multiple paired samples before/after with 2 conditions? (non-crossover)
Entering edit mode
4.2 years ago
jahorst • 0

I'm not sure the GLM in DESeq2 can/should do this, but I've seen posts of others doing so with less complex/robust experimental designs.

My problem is that I can't get a design to work that incorporates a patient identifier, to include between/within-patient variance. I see that vignette('DESeq2'), offers a solution for crossover design (the same patient gets each treatment): to repeat the individual label within each group. But this is a randomized placebo-controlled clinical trial without crossover.

We have 2 samples from each patient, 2 treatments (1 per patient), 2 timepoints (before,after). Total 4 samples per patient. I am blinded as to which treatment group is placebo vs active.

> samples
                            group    time pairs ind.Repeat ind.n
grA.1002.lesion1.before  2grA 1Before     a          1     1
grA.1002.lesion2.before  2grA 1Before     b          1     1
grA.1002.lesion1.after   2grA  2After     a          1     1
grA.1002.lesion2.after   2grA  2After     b          1     1
grB.1016.lesion1.before  1grB 1Before     a          2     2
grB.1016.lesion2.before  1grB 1Before     b          2     2
grB.1016.lesion1.after   1grB  2After     a          2     2
grB.1016.lesion2.after   1grB  2After     b          2     2
grA.2002.lesion1.before  2grA 1Before     a          3     3
grA.2002.lesion2.before  2grA 1Before     b          3     3
grA.2002.lesion1.after   2grA  2After     a          3     3
grA.2002.lesion2.after   2grA  2After     b          3     3
grB.2011.lesion1.before  1grB 1Before     a          1     4
grB.2011.lesion2.before  1grB 1Before     b          1     4
grB.2011.lesion1.after   1grB  2After     a          1     4
grB.2011.lesion2.after   1grB  2After     b          1     4
grA.3052.lesion1.before  2grA 1Before     a          2     5
grA.3052.lesion2.before  2grA 1Before     b          2     5
grA.3052.lesion1.after   2grA  2After     a          2     5
grA.3052.lesion2.after   2grA  2After     b          2     5
grB.3058.lesion1.before  1grB 1Before     a          3     6
grB.3058.lesion2.before  1grB 1Before     b          3     6
grB.3058.lesion1.after   1grB  2After     a          3     6
grB.3058.lesion2.after   1grB  2After     b          3     6

Certainly the most important comparisons in the model are: design=formula(~time*group) And this works.

However, I need help including the pairedness.

~ind.n + time*group does not work:

Error in checkFullRank(modelMatrix) : 
      the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

Again, when making the ind.Repeat column redundant across the groups, and inputting design=formula(~ind.Repeat + time*group), it gives results, but comparisons are spuriously made with patient1 as the reference, without comparing patient2 to patient3, and there is a spurious combination across the treatment groups:

~ind.Repeat + time*group
[1] "Intercept"              "ind.Repeat_2_vs_1"      "ind.Repeat_3_vs_1"     
[4] "time_2After_vs_1Before" "group_2grA_vs_1grB"     "time2After.group2grA"

Of course the software is doing what is told, I'm not blaming the software, just trying to ask how to properly represent the experimental model.

I have tried everything I can think of (input as design=formula(*) into DESeqDataSetFromMatrix, prior to running DESeq). They all either fail, or seem to misrepresent the situation.

The best I have come up with that works:

~pairs:group + group*time
[1] "Intercept"              "group_2grA_vs_1grB"     "time_2After_vs_1Before"
[4] "group1grB.pairsb"       "group2grA.pairsb"       "group2grA.time2After"


~pairs:group:time + group*time
[1] "Intercept"                    "group_2grA_vs_1grB"          
[3] "time_2After_vs_1Before"       "group2grA.time2After"        
[5] "group1grB.time1Before.pairsb" "group2grA.time1Before.pairsb"
[7] "group1grB.time2After.pairsb"  "group2grA.time2After.pairsb"

Again, this runs, and I believe makes sense. I am challenged by wrapping my head around a lack of "pairsA" being directly represented - seems okay as this would be the reference to analyze "pairsB." I was hoping to have "time1Before" be the reference in this 3-way combination, but haven't figured out how to do that.

And, of course the "group2Treatment.time2After" should come last, but again haven't figured out how to make it so. Yet I can just insert this into contrast:




This seems like it might be okay... what do y'all think? Do I need betaPrior=FALSE?

One final idea is to combine the group variables, then set the contrasts manually:

lesion_samples$SGroup <- factor(paste0(lesion_samples$group,lesion_samples$time,lesion_samples$ind.n))
lesion_Data <- DESeqDataSetFromMatrix(countData = lesion_CountTable, colData=lesion_samples, design=formula(~SGroup))
dds <- DESeq(lesion_Data)
> resultsNames(lesion_DESeq)
 [1] "Intercept"          "SGroup1grB1Before2" "SGroup1grB1Before4"
 [4] "SGroup1grB1Before6" "SGroup1grB2After2"  "SGroup1grB2After4" 
 [7] "SGroup1grB2After6"  "SGroup2grA1Before1" "SGroup2grA1Before3"
[10] "SGroup2grA1Before5" "SGroup2grA2After1"  "SGroup2grA2After3" 
[13] "SGroup2grA2After5"


Further details: - this happens to be taxonomic calls (~microbiome data).

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] DESeq2_1.14.1              SummarizedExperiment_1.4.0
[3] Biobase_2.34.0             GenomicRanges_1.26.2      
[5] GenomeInfoDb_1.10.2        IRanges_2.8.1             
[7] S4Vectors_0.12.1           BiocGenerics_0.20.0
DESEq2 DESeq Paired samples • 3.1k views
Entering edit mode
4.2 years ago
Michael Love ★ 2.2k

I prefer to answer DESeq2 questions on the official Bioconductor software support site, because this is where I tell users to look to see if we've answered similar questions already. It's in the documentation, etc.

Would you mind re-posting there, and can you rename the variables in the colData so it's easier to see what you're talking about? I don't follow the design from what you've posted.

You can just name the columns of the colData: patient, treatment, time.

Also, best to not include spaces in the column names or in the factor levels. Some of the base R functions can't handle spaces in variable names, so it's safer to avoid this entirely when using R.

Entering edit mode
4.2 years ago
jahorst • 0

Okay, thanks Prof. Love, done.

btw - there were no spaces in variable names. There are "."s which I believe R can handle. The main issue is how to account for 2 samples coming from each patient at each timepoint.

1 more question: is it also your understanding that GLMs (i.e. SESeq2) cannot take into account paired data? For example, specifically relating / comparing the before & after of each sample to each other? For example, as is done in paired T-tests.


Login before adding your answer.

Traffic: 1216 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6