Question

About logFC and p-value( and adj p-value)

2

Entering edit mode

9.6 years ago

bharata1803 ▴ 580

Hello,

So I try to filter the DE genes and non-DE genes. What I'm confused is the value of p-value. As far as I understand, smaller p-value indicates that the data is not a coincidence. In this case, the difference on expression level between two groups is "really" different and not coincidence. In that case, usually the logFC will also have a quite big number (>1 or <-1).

If I tried to filter near 0 logFC (which is non-DE genes), the p-value is quite big and I assume the meaning is it can be "coincidence". Does it mean that the non-DE genes is actually "coincidence" or the p-value refer to the expression level set and the big p-value meaning is the data is not statistically difference and resulted with near 0 logFC? Is it enough to filter the non-DE genes by near 0 logFC or I also need to filter both near 0 logFC and small p-value (less than 0.05 maybe)?

Thank you in advance.

RNA-Seq micro-array • 13k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.6 years ago by bharata1803 ▴ 580

score 9 · Answer 1 · 2015-11-20

I'll try to give some points to clarify:

When you test for DE you start by assuming that the tested treatment has no effect (null hypothesis). The p-value is the probability of obtaining the observed or more extreme result if the null hypothesis were true. So small p-value means the null hypothesis is probably not true and the treatment does have an effect.
A large p-value indicates that the null hypothesis is plausible, still the treatment could have an effect. NB: The null hypothesis is never true. If you can increase the sample size at your will, you can get p-values as small as you want. So you can say which genes are affected but you can't say which genes are not affected.
Small pvalues are usually correlated to logFC. However, genes with many read counts have higher statistical power to detect DE, in these cases you can see small pvalues for genes with logFC close to zero
In practice: To get DE genes filter for FDR (an adjusted p-value) less then, say, 0.05. Optionally filter also for absolute logFC above a certain threshold, depending on you biological question. It usually doesn't make sense to invest time in genes with logFC close to 0 even if the pvalue is very small. To get genes non-DE you could filter for logFC close to zero and pvalue above, say 0.2, but keep in mind the interpretation of p-value from above (bayesian statistics suffer less of this "issue").