Question

Volcano plot p-value or FDR?

8

Entering edit mode

8.0 years ago

balestrieri.c ▴ 80

Hi everyone, I would like to know if there is any statistical motivation to plot -log(p-value) vs log(FC); why not -log(FDR) vs log(FC) directly?!?

Thanks! !!

RNA-Seq • 22k views

ADD COMMENT • link updated 3.0 years ago by dariober 14k • written 8.0 years ago by balestrieri.c ▴ 80

0

Entering edit mode

It's always a good practice to use corrected p-values instead of p-values.

ADD REPLY • link 8.0 years ago by Nicolas Rosewick 10k

5

Entering edit mode

8.0 years ago

Benn 8.3k

I always use FDR corrected p-values, also in volcanoplots.

In these plots I use -log10(adjusted p-value) and the log2(FC).

ADD COMMENT • link 8.0 years ago by Benn 8.3k

1

Entering edit mode

3.0 years ago

dariober 14k

Old post but since I've been asked the same question here's my 2p... Obviously, a volcano plot is just an x-y plot so it doesn't really matter what variables you use as long as they are meaningful. In any case, looking at this paper and the code in the limma package, where I think the volcano plot was first introduced, the "real thing" uses -log10(p-value).

I think it makes sense to use p-value instead of FDR or other adjustments since after correction different p-values may be squashed into the same value. A somewhat extreme example:

p <- seq(0.01, 1, by= 0.01)
p
0.01 0.02 0.03 0.04 0.05 0.06 0.07... 1

p.adjust(p, method= 'fdr')
1 1 1 1 1  ... 1

I guess it depends whether this squashing is desirable or not. In fact, I disagree that adjusted pvalues are always preferable since you could have different priors for different genes and in such cases you don't want to lose resolution after adjustment.

ADD COMMENT • link 3.0 years ago by dariober 14k

score 4 · Accepted Answer · 2016-05-10

There is none as far as I know reading different papers and reports, you can actually see here which reports volcano plot with -log10(FDR) vs log2(FC), at the end of the day it depends upon the user to use the best scaling metrics for the most represented visualization.

According to the definition it is the plots significance versus fold-change on the y- and x-axes, respectively. So your significance can be FDR corrected as well and in that case you are just restricting your FC values to a much stricter subsets to give more confidence to the visualization. There should not be any other statistical motivation other than giving it a more reliable plotting with less error prone significant points.