Question: [DESeq2 multiple treatments vs. 1-by-1] Inconsistant p-adj
gravatar for madkitty
6.4 years ago by
madkitty620 wrote:

So we have RNAseq data for 3 treatments (A, B and C) that we compared to control. Each treatment should be compared to control (as in A vs. CT, B vs. CT, C vs. CT) and from then we should know the number of differentially expressed genes (padj < 0.1) and which genes are upregulated and downregulated. 

When I run the DESeq2 pipeline with a table containing Control (CT)  and the 3 treatments, we found about 1,000 diferentially expressed genes (padj < 0.1) but the result CSV spreadsheet only had one log2fold column and padj column, where I was expecting to have 3 column log2fold and padj for each comparison  (A vs. CT, B vs. CT and C vs. CT). Since I couldn't extract the log2fold and padj for each comparison, I re-run the DESeq2 pipeline on each treatment vs control separately, and now the number of differentially expressed genes is completely different, I obtained: 

A vs CT : 1000 genes

B vs CT:  2000 genes

C vs CT : 2500 genes

In total that's far beyond the original 1,000 genes I had when running DESeq2 with the 3 treatments vs Control in one spreadsheet.

  1. What causes this difference?
  2. And now I'm wondering which pipeline is the right one?
  3. Should I run it independently or all treatments together on one spreadsheet?
  4. If so, how can I extract the padj and log2 fold changes for each treatment if they are run together??




rna-seq deseq2 • 3.1k views
ADD COMMENTlink modified 14 months ago by Biostar ♦♦ 20 • written 6.4 years ago by madkitty620

You'll need to show the code you used in the first vs. one of the second cases for us to help. My guess is that in the first instance you ended up comparing the full model against something like ~1, which isn't what you want. It's completely possible to extract fold-changes and adjusted p-values while keeping everything in. In fact, you'll get more reliable results that way, due to better variance estimation.

ADD REPLYlink written 6.4 years ago by Devon Ryan98k
gravatar for Michael Love
6.4 years ago by
Michael Love2.1k
United States
Michael Love2.1k wrote:

You want to use the "contrast" argument to the results() function, in order to build a results table for your three desired contrasts. You can run DESeq() on the dataset containing all the samples, which can improve variance estimation as Devon mentioned.

First read over the section on contrasts in the DESeq2 vignette and the help for the "contrast" argument in ?results.

Also, as Devon suggested, it helps to provide your code to get more precise answers to your questions.

ADD COMMENTlink written 6.4 years ago by Michael Love2.1k

I too have the same of similar doubt ,so im comparing multiple cell types each having its own control sample based on cell hierarchy, for normalisation purpose i use all the sample for pca ,clustering ,correlation etc , but when it comes to doing a differential expression i have to do each of them separately ,like Stem cell vs progenitor , Common myeloid progenitor vs Granulocyte monocyte progenitor[GMP] , then GMP vs Monocyte .

Its a conceptual doubt since my control is not always the same as in one case its stem cell and other one is progenitor cell .So how can i make multiple in this case as each comparison will have different foldchange and the calculated p value ,unless i have my control same for every test.

I have used the contrast when i was making comparison stem cell with everything downstream ,but how to do when controls are different

ADD REPLYlink written 23 months ago by krushnach80870
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1030 users visited in the last hour