Entering edit mode
3.0 years ago
Kai_Qi
▴
130
Hi:
I am trying to get a better understanding of how the results of DEseq2 come out, I spend a few days to get to understand the p_value. I don't know if my understanding is right. So I posted it here for discussion:
Let's assume we have 4 samples control (S1, S2); Knockout (S3, S4)
- the reads number(a value) of genes follows negative binomial distribution, so the reads in S1, S2, S3, S4 will each has a mean and variance (from the reads of each gene).
- When doing the analysis, fold change, for example, Log(S3+S4)-Log(S1+S2), it will generate a new distribution (let name it S5) and this new distribution will have its own mean and variance.
- Now I have a new of distribution with values represent the differences between the reads of each gene from control and knockout. And the distribution will have a mean and a variance.
- use the mean and variance to evaluate the probability of each value in the new distribution (S5), if the P value < 0.05, we see that the differences of counts between control and knockout is significant, if not we accept the null hypothesis.
I am not sure my understanding on how Deseq2 generate p value is right or not. Thanks ahead for anyone that comments on this.
Best,
Has been cross-posted: https://support.bioconductor.org/p/9136691/
Sorry for not mention earlier. I mentioned it when posted it on bioconductor. I would like to get some mentoring on the principles. I have some problem in connecting book with practical questions.
Thanks for pointing it out
Did you try to read the DESeq paper?
Read the tutorial and the paper. Tutorial is relatively easier, whereas the original paper is a little bit harder. So I turned to my college books(in my native language), and tried to understand better. Thanks, I will keep reading the original paper