Question

Which value to be considered as significant for RNAseq pvalue or qvalue

0

Entering edit mode

7.2 years ago

raya.girish ▴ 30

Hello All Recently I got new project it has three control c1 c2 c3 and three test s1 s2 s3 My pipeline which I followed was tophat cufflink cuffdiff I aligned my reads to hg19 For differential gene expression I used cuffdiff I got gene.diff file in that there is pvalue and there is qval Now my question is how should I filter my significant upregulated or down regulates genes . should I consider qval (0.05) or pvalue (0.05). If it's pvalue please I need help in understanding why we are not considering qval ? Also I have heard that scientifically to have statistical significance we need minimum three replicate why is that so?

RNA-Seq statistical significance pvalue • 4.5k views

ADD COMMENT • link updated 7.2 years ago by Daniel ★ 4.0k • written 7.2 years ago by raya.girish ▴ 30

score 1 · Answer 1 · 2017-02-03

Use the qval, not the raw pvalue. The "minimum of 3 replicates" is a good general rule of thumb since you need that many to have a decent shot at measuring variance. I personally recommend at least 6 replicates, which happen to fit nicely on a single lane of a HiSeq if you have a standard two group comparison setup.

score 0 · Answer 2 · 2017-02-03

Imagine you're doing a statistical test on some data and you're 99.9% sure that it's correct. Then you can be pretty sure that what it tells you is right (the p value).

But if you do 1,000 tests, you're probably going to get one test that says something untrue (100% - 99.9%). If you do 30,000 tests then you're going to get a lot of false positive values by the end.

The q-value is a modified p-value that takes into account that you'll get some false positives based on how many tests you're doing. This is called a False Discovery Rate (FDR) and there are multiple ways of calculating it.

Long story short: Use the q-value, it reduces the number of false positives.