Question: DESeq2 for pairwise comparison of multiple groups
0
gravatar for thjnant
4 days ago by
thjnant100
Germany
thjnant100 wrote:

Hello,

I have 4 different groups (species) that I want to look into their differential gene expression. I call them A, B, C and D.

I have 5 - 8 replicates for each group and I am using DESEQ2 for the analysis.

I am facing a difficulty which I cannot interpret.

I first made a separate data frame for each pairwise comparison, that is A vs B, A vs C, etc. I then created a dds data matrix for each pairwise comparison and then called the DESeq function. Using results, I obtained the number of significantly differentially expressed genes.

I then learnt about the contrast option. So, I made the dds data matrix this time using all groups A, B, C and D and proceeded with the DESeq function. I then used results function with contrast to get the output of each pairwise comparison, A vs B, A vs C, etc.

I get different number of differentially expressed genes in the two comparisons. Why is that the case?

Thank you!

PS: I have posted this question to the bioconductor forum: https://support.bioconductor.org/p/131229/#131235

rna-seq deseq2 R • 80 views
ADD COMMENTlink modified 3 days ago • written 4 days ago by thjnant100

Cross-posted on Bioconductor: https://support.bioconductor.org/p/131229/

thjnant, when you do this, in future, can you mention it in your question?

ADD REPLYlink written 4 days ago by Kevin Blighe59k

So sorry for cross-posting. I mentioned it in my post in bioconductor forum. I will now add it to my question here too.

ADD REPLYlink written 3 days ago by thjnant100

Sure thing. Oh, it's no problem - just helps so that users do not duplicate efforts.

ADD REPLYlink modified 3 days ago • written 3 days ago by Kevin Blighe59k
2
gravatar for Asaf
4 days ago by
Asaf7.6k
Israel
Asaf7.6k wrote:

Probably it's because you have better estimates of the variation of each gene. take a look at the gene lists, they shouldn't be too different, if that's not the case then something is wrong.

ADD COMMENTlink written 4 days ago by Asaf7.6k

Thank you for your reply. I checked the results between the two. Of the top 50 most significant comparisons, 27 genes are common. I have more significant genes detected when I use a dataset containing only the pair of interest.

out of 13297 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 83, 0.62%
LFC < 0 (down)     : 132, 0.99%
outliers [1]       : 119, 0.89%
low counts [2]     : 1021, 7.7%
(mean count < 5)

But when I use the whole set and use contrast to get the comparison of interest, I have:

out of 13737 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 19, 0.14%
LFC < 0 (down)     : 30, 0.22%
outliers [1]       : 86, 0.63%
low counts [2]     : 1057, 7.7%

I think the second approach might be better as like you mentioned, there will be a better estimation of variation in the gene.

ADD REPLYlink written 4 days ago by thjnant100
1

When you process a group of samples together, DESeq2 will estimate and calculate different parameters, including gene dispersion and sample size factors - these calculations are dependent on all samples in your dataset. These parameters are then used when normalising the raw counts and, ultimately, when determining differential expression.

So, if you subset your dataset and normalise subsets independently, these key parameters will have different values. This is all that is happening. The key genes that are genuinely differently expressed should always still appear, unless you have some extreme outliers or some major batch effects.

ADD REPLYlink written 3 days ago by Kevin Blighe59k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1557 users visited in the last hour