Problem in understanding False Discovery Rate (FDR)
2
2
Entering edit mode
2.0 years ago

Hi friends, I am facing a problem understanding the concept of FDR or False Discovery Rate in Multiple Hypothesis Testing. May be the question is silly, please try to bear with me. Actually, when we are talking about P-Value for a single hypothesis test, a value below 0.05 is considered as 5% result is false positive regarding that particular hypothesis. So far, no problem. But, whenever there is suppose 20,000 tests why there will be 5% of the tests is false positive? The P-value we are talking about is for a single hypothesis, why we are connecting that with 20,000 tests? Here, each of the hypothesis is a separate entity, they are independent of each other. Then 5% of 30,000 will be false positive?

statistics FDR • 1.0k views
1
Entering edit mode

Actually, when we are talking about P-Value for a single hypothesis test, a value below 0.05 is considered as 5% result is false positive regarding that particular hypothesis. So far, no problem.

Actually even this is not strictly correct. P-value < 0.05 does not tell you almost anything about the probability of the hypothesis different from null.

I provide a Bayesian view - there are other views (e.g. likelihood) so without informative priors (like "toss-up" here) - and it sill is not interpreted as the probability of an alternative hypothesis.

0
Entering edit mode

Very nice slide, thanks! It implicates that as a scientist you should think about the plausibility of your new hypotheses before you obtain new data. But this is difficult, of course...

0
Entering edit mode

with FDR correction and many tests it is magically resolved because of magic (I've asked at stats.stackexchange and this was the answer) - but for small number of tests sure, prior beliefs and expected effect size are important...

0
Entering edit mode

Hi, I am trying to understand FDR in the following way. Please let me know whether my understanding right or wrong. Suppose, 100 people are tossing a biased coin (95% probability of Head and 5% probability of Tail). When each person is tossing, most of the time he will get Head, rarely Tail. But, when 100 people will toss, definitely any 5 people will get Tail. These 5 people are like FDR in our case, Right? This 5% will become a huge number when we are testing 1000 genes, i.e., 1000 people are tossing? Please let me know if my understanding is true or not. Thanks

1
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

2
Entering edit mode
2.0 years ago

No statistical test is 100% accurate, it's a probability. So let's say you test if two means are significantly different from each other. Your test tells you to reject the null hypothesis with P=0.05. But this test isn't absolute, it also has inherent error. Let's say you are wrong once in ten tests. If you only use this test once, no big deal. The trouble arises when you start using this test many times. Let's say you repeat this test for a hundred different populations. Since your test is wrong 1/10th of the time, with 100 tests you can expect to be wrong 10 times on average.

That's where FDR comes in. Methods like Benjamini-Hochberg or Bonferroni aim to control for these false positives, by adjusting your P-value to the number of tests (and/or the distribution of the P-values of all tests). This allows you to repeat a test many times without suffering from a high false positive rate.

1
Entering edit mode
2.0 years ago

Let's imagine we compare gene expression for 20.000 genes between 2 equal groups. The groups are equal - no genes are differentially expressed. But 1/20 of your tests will give you p-value < 0.05 (0.05 = 1/20).

You will report 1000 genes to be differentially expressed. Not cool. Because the right answer is 0.