So I try to filter the DE genes and non-DE genes. What I'm confused is the value of p-value. As far as I understand, smaller p-value indicates that the data is not a coincidence. In this case, the difference on expression level between two groups is "really" different and not coincidence. In that case, usually the logFC will also have a quite big number (>1 or <-1).

If I tried to filter near 0 logFC (which is non-DE genes), the p-value is quite big and I assume the meaning is it can be "coincidence". Does it mean that the non-DE genes is actually "coincidence" or the p-value refer to the expression level set and the big p-value meaning is the data is not statistically difference and resulted with near 0 logFC? Is it enough to filter the non-DE genes by near 0 logFC or I also need to filter both near 0 logFC and small p-value (less than 0.05 maybe)?

When you test for DE you start by assuming that the tested treatment has no effect (null hypothesis). The p-value is the probability of obtaining the observed or more extreme result if the null hypothesis were true. So small p-value means the null hypothesis is probably not true and the treatment does have an effect.

A large p-value indicates that the null hypothesis is plausible, still the treatment could have an effect. NB: The null hypothesis is never true. If you can increase the sample size at your will, you can get p-values as small as you want. So you can say which genes are affected but you can't say which genes are not affected.

Small pvalues are usually correlated to logFC. However, genes with many read counts have higher statistical power to detect DE, in these cases you can see small pvalues for genes with logFC close to zero

In practice: To get DE genes filter for FDR (an adjusted p-value) less then, say, 0.05. Optionally filter also for absolute logFC above a certain threshold, depending on you biological question. It usually doesn't make sense to invest time in genes with logFC close to 0 even if the pvalue is very small. To get genes non-DE you could filter for logFC close to zero and pvalue above, say 0.2, but keep in mind the interpretation of p-value from above (bayesian statistics suffer less of this "issue").

I see. Thank you for your complete explanation. So, in this case, maybe I just use the logFC close to zero to get the non-DE genes. My other question is, will it be useful to develop my statistical test that aimed for getting the non-DE genes instead of using logFC from current method to find DE genes? Do you think it wil have more confidence if the test is spesifically aimed to filter non-DE genes?

I see. Thank you for your complete explanation. So, in this case, maybe I just use the logFC close to zero to get the non-DE genes. My other question is, will it be useful to develop my statistical test that aimed for getting the non-DE genes instead of using logFC from current method to find DE genes? Do you think it wil have more confidence if the test is spesifically aimed to filter non-DE genes?