Question: Why use the qvalue package instead of p.adjust in R stats?
9
gravatar for rmccloskey
2.8 years ago by
rmccloskey170
Canada
rmccloskey170 wrote:

A lot of papers I've read have used the qvalue package from Bioconductor to do multiple testing correction. There is a function in the basic R stats package called p.adjust which seems to do the same thing. What is the advantage of the qvalue package, and why is it so frequently used instead of the built in R version?

qvalue R • 9.7k views
ADD COMMENTlink modified 2.8 years ago by Devon Ryan73k • written 2.8 years ago by rmccloskey170
17
gravatar for Devon Ryan
2.8 years ago by
Devon Ryan73k
Freiburg, Germany
Devon Ryan73k wrote:

p.adjust() and the qvalue package aren't actually doing exactly the same thing. They are doing quite similar things with similar ends in mind, but the algorithms are different, so they produce different results (Gordon Smyth, who wrote p.adjust(), has a short summary of the history here and Tim Triche gives a nice explanation of the differences in the same thread). Interestingly, the packages I use almost always use p.adjust() rather than the qvalue package, though if you read a paper that uses them the authors might actually say "q-value" rather than "BH adjusted p-value".

BTW, q-values are a bit more difficult to calculate and you actually need quite a few data points. In effect, you have to use the data at hand to estimate the expected null rate and then multiply that by the BH adjusted p-value. Consequently, p.adjust() ends up being more generally applicable, but in cases where q-values are appropriate they tend to give more power.
 

ADD COMMENTlink written 2.8 years ago by Devon Ryan73k

Thanks for the great explanation!

ADD REPLYlink written 2.8 years ago by rmccloskey170

Hello--The link under the phrase "the same thread" points to "https://stat.ethz.ch/pipermail/bioconductor/attachments/20121219/00dc27b1/attachment.pl" which is a perl file. What link did you intend? Thanks!

Something to add: When I went to the original post, below it is the following statement:

An embedded and charset-unspecified text was scrubbed... Name: not available URL: https://stat.ethz.ch/pipermail/bioconductor/attachments/20121219/00dc27b1/attachment.pl

But I'd love to read more explanations of the differences!

ADD REPLYlink modified 7 months ago • written 7 months ago by ariel.balter110
0
gravatar for Jean-Karim Heriche
2.8 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche13k wrote:

The qvalue package implements FDR methods taylored for genomics studies. The p.adjust function implements "standard" FDR methods. The qvalue package also has a plotting function and a GUI.

ADD COMMENTlink written 2.8 years ago by Jean-Karim Heriche13k

I know I'm a bit late to the party, but I wanted to point out that the qvalue package is not necessarily tailored to genomic studies.  The qvalue algorithm has underlying assumptions that only allow its applicability when there is no dependence in the data, and with a specific p-value distribution.  IF there is dependence in the data, and a "U-shaped" p-value distribution, then qvalue is not applicable (according to the q-value package vignette).  I'm honestly not sure if dependence affects the BH method of p.adjust.
 

ADD REPLYlink modified 22 months ago • written 22 months ago by reilly.brian.m40

Thanks for the clarification. My understanding is that the BH procedure still controls the FDR when the tests statistics are positively correlated. The BY modification can be applied for other cases but is more conservative.
Edit: I associated qvalue with genomics studies from this paper.

ADD REPLYlink modified 22 months ago • written 22 months ago by Jean-Karim Heriche13k

Yup that paper is the same paper that I associate the q-value with as well.  The vignette I was speaking of was from the R/Bioconductor qvalue package, which I think was developed from the authors of the PNAS paper.  Here is the vignette, specifically section 5.2 talks about the applicability of the method with different pvalue distributions.

I have also read that the BH procedure is applicable in positive dependence, and that supposedly the BHY method is applicable in both positive and negative dependence.   My main issue was understanding what positive/negative correlation/dependence looked like, and how to tell when you have those scenarios.

ADD REPLYlink written 21 months ago by reilly.brian.m40
1

The property that must be satisfied by the test statistics (if they are not independent) in order for the BH procedure to be guaranteed to work is called positive regression dependence on a subset (abbreviated as PRDS). As I understand it, this means that given a series of tests, the probability of a test corresponding to a true null hypothesis doesn't decrease as the p-values increase. Or put another way, when you order the p-values of your tests from lowest to highest, the probability of the null hypothesis being true should not decrease. This means you should know how the distributions of your test statistics depend on the null hypotheses before applying the BH procedure. In general we don't know if the PRDS condition is satisfied. However, my initial understanding was that the PRDS condition is satisfied if the test statistics are positively correlated to each other but this has apparently only been proven for Gaussian distributed one-sided test statistics. It seems that in all other cases, one should be applying the BY procedure. So strictly speaking, before using the BH procedure, one should justify that either the test statistics are independent or that the PRDS condition is satisfied. My suspicion is that in practice the BH procedure is robust enough to departure from PRDS to still be controlling the FDR adequately.

ADD REPLYlink written 21 months ago by Jean-Karim Heriche13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1370 users visited in the last hour