I am doing a DESeq2 comparison with different levels and one factor. To do this, I have performed the analysis in two different ways.
First, putting all the samples in the same DESeq object and then extracting each comparison:
> sampleinfo
    FileName    SampleName  Status
    A_1_count   A_1 A       
    A_2_count   A_2 A       
    B_3_count   B_3 B       
    B_4_count   B_4 B       
    C_5_count   C_5 C   
    C_6_count   C_6 C   
    D_7_count   D_7 D   
    D_8_count   D_8 D   
    E_9_count   E_9 E   
    E_10_count  E_10    E
dds <- DESeqDataSetFromMatrix(countData = cts,
                                colData = sampleinfo,
                              design = ~ Status)
dds$Status <- relevel(dds$Status, ref = "E")
And the results:
dds <- DESeq(dds)
res_A <- results(dds,name="Status_A_vs_E")
res_B <- results(dds,name="Status_B_vs_E")
res_C <- results(dds,name="Status_C_vs_E")
res_D <- results(dds,name="Status_D_vs_E")
And doing these comparisons one by one separately on different DESeq objects.
> sampleinfo_A
    FileName    SampleName  Status
    A_1_count   A_1 A       
    A_2_count   A_2 A       
    E_9_count   E_9 E   
    E_10_count  E_10    E
> sampleinfo_B
    FileName    SampleName  Status
    B_3_count   B_3 B       
    B_4_count   B_4 B       
    E_9_count   E_9 E   
    E_10_count  E_10    E
dds_A <- DESeqDataSetFromMatrix(countData = cts_A,
                                colData = sampleinfo_A,
                              design = ~ Status)
dds_B <- DESeqDataSetFromMatrix(countData = cts_B,
                                colData = sampleinfo_B,
                              design = ~ Status)
And the results:
dds_A <- DESeq(dds_A)
res_A <- results(dds_A)
dds_B <- DESeq(dds_B)
res_B <- results(dds_B)
(Repeat for each condition)
However, the results give me different between the 2 methods. Does anyone know why is this happening? How it is the correct way to compare all to E?
Thank you!
If I get better estimates of dispersion, is the variance better reflected in the gene expression for a given mean value? So you think it is a better way to compare all groups vs E?
Generally speaking, that would be the best strategy. There are only few cases where splitting the dataset before variance estimation might be the best strategy (see ATpoint answer) .
Thank you! Very helpful :)