Question

Running Deseq2 with all samples vs samples for each comparison separately

0

Entering edit mode

3 months ago

manuelmourato25 • 0

Hello,

I am currently using Deseq2 to perform differential analysis on my data. I have feature count data for 24 samples and 4 comparisons to do ( for each comparison I have 3 control samples and 3 knockout samples).

The issue is: when I run Deseq2 for all of the samples, specifying all 4 comparisons, I get very different foldchanges and p-values then when I split the feature count data into 4 files, one per comparison, and run Deseq2 for each comparison separately.

One thing I noticed is that the p-value distribution peaks at around 0 when running with data for a single comparison, but peaks at around 1 when using all of the samples data ( 4 comparisons).

Why does this cause such a big difference and what is the best practice I should follow for these analysis?

Thank you

Deseq2 DGE • 457 views

ADD COMMENT • link updated 3 months ago by swbarnes2 14k • written 3 months ago by manuelmourato25 • 0

score 0 · Answer 1 · 2024-01-18

0

Entering edit mode

3 months ago

swbarnes2 14k

In general, it's preferable to include all your samples in the dds object, for better size normalization and dispersion estimates. But if PCA shows that your sample groups differ widely then sometimes this doesn't make sense.

ADD COMMENT • link 3 months ago by swbarnes2 14k

0

Entering edit mode

Thank you for you reply. The cells from each group differ in both cell type and tissue from which they are collected. And so in the pca, the groups are very far apart. Do you think this could be a factor to explain the huge descrepancy that I am seeing? Do you by chance recommend any literature regarding this topic? Thank you

ADD REPLY • link 3 months ago by manuelmourato25 • 0

0

Entering edit mode

I would not include samples of different tissue types together. The assumptions underlying library size normalization might be violated.

ADD REPLY • link 3 months ago by swbarnes2 14k