So we have RNAseq data for 3 treatments (A, B and C) that we compared to control. Each treatment should be compared to control (as in A vs. CT, B vs. CT, C vs. CT) and from then we should know the number of differentially expressed genes (padj < 0.1) and which genes are upregulated and downregulated.
When I run the DESeq2 pipeline with a table containing Control (CT) and the 3 treatments, we found about 1,000 diferentially expressed genes (padj < 0.1) but the result CSV spreadsheet only had one log2fold column and padj column, where I was expecting to have 3 column log2fold and padj for each comparison (A vs. CT, B vs. CT and C vs. CT). Since I couldn't extract the log2fold and padj for each comparison, I re-run the DESeq2 pipeline on each treatment vs control separately, and now the number of differentially expressed genes is completely different, I obtained:
A vs CT : 1000 genes
B vs CT: 2000 genes
C vs CT : 2500 genes
In total that's far beyond the original 1,000 genes I had when running DESeq2 with the 3 treatments vs Control in one spreadsheet.
- What causes this difference?
- And now I'm wondering which pipeline is the right one?
- Should I run it independently or all treatments together on one spreadsheet?
- If so, how can I extract the padj and log2 fold changes for each treatment if they are run together??