Question: Which value to be considered as significant for RNAseq pvalue or qvalue
0
3.2 years ago by
raya.girish20
raya.girish20 wrote:

Hello All Recently I got new project it has three control c1 c2 c3 and three test s1 s2 s3 My pipeline which I followed was tophat cufflink cuffdiff I aligned my reads to hg19 For differential gene expression I used cuffdiff I got gene.diff file in that there is pvalue and there is qval Now my question is how should I filter my significant upregulated or down regulates genes . should I consider qval (0.05) or pvalue (0.05). If it's pvalue please I need help in understanding why we are not considering qval ? Also I have heard that scientifically to have statistical significance we need minimum three replicate why is that so?

modified 3.2 years ago by Daniel3.8k • written 3.2 years ago by raya.girish20
1
3.2 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

Use the qval, not the raw pvalue. The "minimum of 3 replicates" is a good general rule of thumb since you need that many to have a decent shot at measuring variance. I personally recommend at least 6 replicates, which happen to fit nicely on a single lane of a HiSeq if you have a standard two group comparison setup.

0
3.2 years ago by
Daniel3.8k
Cardiff University
Daniel3.8k wrote:

Imagine you're doing a statistical test on some data and you're 99.9% sure that it's correct. Then you can be pretty sure that what it tells you is right (the p value).

But if you do 1,000 tests, you're probably going to get one test that says something untrue (100% - 99.9%). If you do 30,000 tests then you're going to get a lot of false positive values by the end.

The q-value is a modified p-value that takes into account that you'll get some false positives based on how many tests you're doing. This is called a False Discovery Rate (FDR) and there are multiple ways of calculating it.

Long story short: Use the q-value, it reduces the number of false positives.