Question: Should p-values resulting from the experimental validation of genes selected by FDR also be FDR-corrected?
0
gravatar for correlationmatrix
3.4 years ago by
correlationmatrix20 wrote:

Let's say you perform a large number of statistical tests based on some high-throughput screen. Since you are afraid that some of the resulting p-values could have arisen by chance, you perform FDR-correction and only proceed with those whose p-values are < 5% likely to be wrong. Subsequent statistical tests applied to the experimental follow-up results then gives that X number of those genes indeed had p-values below your desired threshold. The question then becomes: should one stop here and consider the original screen results for these particular genes to be successfully validated; or, should one also perform FDR-correction on the validated p-values?

ADD COMMENTlink modified 3.4 years ago by Nicolas Rosewick9.2k • written 3.4 years ago by correlationmatrix20
1

Yep, you should FDR adjust for the number (X) of hypotheses tested in your follow-up experiment.

Re "Subsequent statistical tests applied to the experimental follow-up results then gives that X number of those genes indeed had p-values below your desired threshold.". Does this imply that you are re-checking the expression changes in the same set of samples?

ADD REPLYlink written 3.4 years ago by russhh5.5k

Sorry it is hard to follow...

What kind of subsequent statistical tests are you referring to?

ADD REPLYlink written 3.4 years ago by Benn8.0k
1

A concrete example that may clarify what I mean:

That the original screen was an shRNA dropout screen, where the changes in the abundances of shRNAs that mediate knockdown of various genes were assessed between two time points. According to this screen, shRNAs targeting Gene 1,2,3,4 and 5 were significantly (FDR<0.05) depleted from the cell population at the later time point. This indicates that these genes may be essential to this cell type (knockdown reduces survival, and thus the corresponding shRNAs are under negative selection).

To validate this, follow-up experiments were designed to only knock down Gene 1,2,3,4 and 5, either using one of the original shRNAs targeting those genes or other ones that have been confirmed to be efficient in successfully mediating knockdown. The validation experiment had a different setup and only compared the decrease in growth rate following knockdown of a given gene. This experiment then found that some of the genes assessed in this way indeed yielded significant decreases in growth rate, according to a raw p-value threshold of 0.05. Would this be enough to conclude that the original screen hits were true, or should those validation p-values also be corrected?

The reasoning is that if the original p-values of the large scale screen were falsely significant due to the multiple testing problem, then even raw p-values of the corresponding genes in the validation experiment should not be significant by chance.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by correlationmatrix20

OK, thanks for the clear explanation!

Your second (validation) experiment must be seen as a new experiment, so with a new null hypothesis and so on. The null hypothesis will probably be the same as your first one (hence no difference). If your second experiment again is high-throughput, then yes you'll need to correct for multiple testing again. If you only look at a few genes (say your 5 genes of interest) then it is not really necessary.

I hope this will explain a bit?

ADD REPLYlink written 3.4 years ago by Benn8.0k

Ok. In this case the validation experiment only considers 5 genes. So strictly speaking, multiple hypotheses are being tested and additionally the experiment uses another setup, so the null hypothesis would be different. I guess then it would be appropriate to correct for multiple testing.

ADD REPLYlink written 3.3 years ago by correlationmatrix20
1
gravatar for Nicolas Rosewick
3.4 years ago by
Belgium, Brussels
Nicolas Rosewick9.2k wrote:

You should correct your nominal p-values only once.

In R:

p # vector of nominal p-values
fdr <- p.adjust(p,method="fdr")

FYI It's always a good idea to check the distribution of your nominal p-values before multi-testing correction :

hist(p,breaks=20)

Here's a nice post concerning p-value distribution : http://varianceexplained.org/statistics/interpreting-pvalue-histogram/

ADD COMMENTlink written 3.4 years ago by Nicolas Rosewick9.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1018 users visited in the last hour