Question: DESeq results based on lfcthreshold
gravatar for BS
2 days ago by
BS0 wrote:

I have a question regarding the results of DESeq2.

With one dataset, by changing the lfcthreshold from 0 to 0.75 I found a decrease in the number of DEGs ---- from 100 to 3. DESeq gave 101 for another set of data with an lfcthreshold of 0.75 and 215 for an lfcthreshold of 0.

Why is the DEG drop more in the first case? Is it because of the variation in the read counts in the replicate samples?

rna-seq • 76 views
ADD COMMENTlink modified 1 day ago • written 2 days ago by BS0

Hi BS,

here is the reason why the number of DEG drop if you rise the lfcthreshold: link

ADD REPLYlink written 1 day ago by andres.firrincieli600

Thank you Andres:) I understand the implication of using a strict lfcthreshold.

ADD REPLYlink written 1 day ago by BS0

Thanks Andres - that's an important answer from James on Bioconductor to which to link here. BS, I am not sure of your experience but, unless you have a good reason to modify the default value of lfcThreshold, then it may be better to leave it at 0 and filter for fold-change in the final results table that is generated. A typical cut-off there would be Adjusted p < 0.05 and absolute Log2FC > 2.

ADD REPLYlink written 1 day ago by Kevin Blighe61k

I personally test against a log2(1.2) simply to get rid of genes that are statistically significant but show tiny fold changes which then (in my head) are unlikely to actually drive any meaningful biological differences. A small threshold like 1.2 is (I think) better than post-hoc filtering for 1.5 or 2 because post-hoc filtering flavours genes that are lowly-expressed and therefore more prone to show large FCs (assuming you did not shrink the FCs with DESeq2). This is pretty much what the edgeR authors recommend.

ADD REPLYlink written 1 day ago by ATpoint36k

Thank you very much Kevin for the answer:) The experimental design and the species studied are different. However, the data-prep and sequencing methods were similar. In DESeq if you specify the lfcthreshold the results function will ignore the pvalue. Note that I have used p<0.05 in both cases.

ADD REPLYlink written 1 day ago by BS0

The result function doesn't ignore the p-value.

ADD REPLYlink written 1 day ago by Devon Ryan95k

Kevin has given the one and only good answer you can give here due to a virtually infinite number of reasons. You have independent datasets, results will always be different. Statistical power, te true underlying biological effect, batch effects, sequencing depth, variance and dispersion, number of genes surviving the independent filtering, it can be everything, results are not predictable, that is why you run experiments and apply rigid statistics to get meaningful and reliable results. There is no simple answer for this.

ADD REPLYlink written 1 day ago by ATpoint36k
gravatar for Kevin Blighe
2 days ago by
Kevin Blighe61k
Kevin Blighe61k wrote:

Each dataset that you process is obviously going to return different results due to a virtually infinite number of reasons. This is the simple explanation that I or anybody else can give without further specific information about these datasets. We would need to know, for example, whether the conditions under study are the same, and also the sample sizes and both the data-prep and sequencing methods - seeing the code that you have run when processing these would assist, too. You have not mentioned anything about p-value cut-offs, either. You are correct, though, in that variation in your replicates will be a key factor.


ADD COMMENTlink written 2 days ago by Kevin Blighe61k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 994 users visited in the last hour