Question: DESeq2: different log2FC results when using subset of data
0
gravatar for weixiaokuan
3.5 years ago by
weixiaokuan90
United States
weixiaokuan90 wrote:

Hi,

I have data with 4 levels group: condition1, treat1, condition2, treat2.

I am trying to use DESeq2 to identify differentially expressed genes between two levels: treat1 vs condition1.

I tried two methods. First, I only use subset of all data, condition1 and treat1 to get the differentially expressed genes between treat1 vs condition1. Second, I use all the data including condition1, treat1, condition2 and treat2; but I extract the differentially expressed genes between treat1 vs condition1 by results(dds, contrast("group","treat1","condition1")).

Interestingly, the identified differentially expressed genes are completely different. When I checked size factors for each sample in these two different methods, they are also completely different.  Is this an expected behavior for DESeq2 pipeline? If so, which method should I use?

Thank you.

 

 

rna-seq deseq2 • 2.1k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by weixiaokuan90
2

I would actually expect the size factor to be different between the first test and the second. This is mainly because you have more samples to account for the difference (e.g. samples from condition2 and treatment 2), therefore the size factor will differ. Put it this way, if for example, the total number of reads in condition 1 is 10 and treatment 1 is 20, then the size factor can be 2 and 1 or something along that line. However, if then I have condition2 with read number at around 15 and treatment around 25, then the size factor can be 2.5, 1.25, 5/3 and 1 for condition 1, treatment1, condition 2 and treatment 2 respectively. This is a very naive example and the actual size factor calculation can be a bit different, but it gives you the concept of why the size factor can differ. As for the different result part, I suspect it might be something to do with the parameter estimation (e.g. the over dispersion parameter estimation), with more samples, the parameter estimation might be a bit different, therefore leads to the difference in results. However I am not 100% sure.

ADD REPLYlink written 3.5 years ago by Sam2.2k
1

> ...the identified differentially expressed genes are completely different.

By what measure? Do they not even rank highly when compared between methods? Are you simply comparing the overlaps at a given cutoff? What if you plot the p-values for all genes as determined by method 1 vs method 2, is there any correlation? One can expect *some* difference between methods because variance for each gene in the entire data set will be different than in only a partial data set. But the degree of difference may depend on how different your conditions are. Are there huge differences in the data between conditions by other measures?

ADD REPLYlink written 3.5 years ago by seidel6.6k

Are condition1 and condition2 biological replicates? Likewise, are treat1 and treat2 biological replicates? Or do these represent 4 different groups (with or without biological replicates)?

ADD REPLYlink written 3.5 years ago by h.mon22k

Friends,

Thank you for your kind answers. I now did realize that this is expected and I did my due diligience search and find some other post discussing the similar questions. So, I'll use all the samples to build the model and extract different comparisons from the contrast matrix.

-X

ADD REPLYlink written 3.5 years ago by weixiaokuan90
5
gravatar for Michael Love
3.5 years ago by
Michael Love1.7k
United States
Michael Love1.7k wrote:

This is not surprising, because the model estimates (size factors, dispersion, priors, etc) use all samples. I suggest exploring a PCA plot to see the relationships between samples (see vignette).

ADD COMMENTlink written 3.5 years ago by Michael Love1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour