DESeq results based on lfcthreshold
1
0
Entering edit mode
3.8 years ago
BS • 0

I have a question regarding the results of DESeq2.

With one dataset, by changing the lfcthreshold from 0 to 0.75 I found a decrease in the number of DEGs ---- from 100 to 3. DESeq gave 101 for another set of data with an lfcthreshold of 0.75 and 215 for an lfcthreshold of 0.

Why is the DEG drop more in the first case? Is it because of the variation in the read counts in the replicate samples?

RNA-Seq • 1.7k views
ADD COMMENT
1
Entering edit mode

Hi BS,

here is the reason why the number of DEG drop if you rise the lfcthreshold: link

ADD REPLY
0
Entering edit mode

Thank you Andres:) I understand the implication of using a strict lfcthreshold.

ADD REPLY
0
Entering edit mode

Thanks Andres - that's an important answer from James on Bioconductor to which to link here. BS, I am not sure of your experience but, unless you have a good reason to modify the default value of lfcThreshold, then it may be better to leave it at 0 and filter for fold-change in the final results table that is generated. A typical cut-off there would be Adjusted p < 0.05 and absolute Log2FC > 2.

ADD REPLY
0
Entering edit mode

I personally test against a log2(1.2) simply to get rid of genes that are statistically significant but show tiny fold changes which then (in my head) are unlikely to actually drive any meaningful biological differences. A small threshold like 1.2 is (I think) better than post-hoc filtering for 1.5 or 2 because post-hoc filtering flavours genes that are lowly-expressed and therefore more prone to show large FCs (assuming you did not shrink the FCs with DESeq2). This is pretty much what the edgeR authors recommend.

ADD REPLY
0
Entering edit mode

Thank you very much Kevin for the answer:) The experimental design and the species studied are different. However, the data-prep and sequencing methods were similar. In DESeq if you specify the lfcthreshold the results function will ignore the pvalue. Note that I have used p<0.05 in both cases.

ADD REPLY
1
Entering edit mode

The result function doesn't ignore the p-value.

ADD REPLY
1
Entering edit mode

Kevin has given the one and only good answer you can give here due to a virtually infinite number of reasons. You have independent datasets, results will always be different. Statistical power, te true underlying biological effect, batch effects, sequencing depth, variance and dispersion, number of genes surviving the independent filtering, it can be everything, results are not predictable, that is why you run experiments and apply rigid statistics to get meaningful and reliable results. There is no simple answer for this.

ADD REPLY
2
Entering edit mode
3.8 years ago

Each dataset that you process is obviously going to return different results due to a virtually infinite number of reasons. This is the simple explanation that I or anybody else can give without further specific information about these datasets. We would need to know, for example, whether the conditions under study are the same, and also the sample sizes and both the data-prep and sequencing methods - seeing the code that you have run when processing these would assist, too. You have not mentioned anything about p-value cut-offs, either. You are correct, though, in that variation in your replicates will be a key factor.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6