Question

How to represent multiple p-values ?

0

Entering edit mode

9.5 years ago

Nicolas Rosewick 10k

Hi,

I made several simulation using a gene list of interest and different publicly available cancer gene list to see if a there is an enrichment.

So I have a matrix of p-values like this

         param1    param2    param3
list1    pval_a    pval_b    pval_c
list2    pval_d    pval_e    pval_f
list3    pval_g    pval_h    pval_i

I know it's difficult to compare p-values because each cancer gene list has a different size so the statistcal power will be different. But do you have advice to represent these results in a easily readable plot ?

Thanks

p-values plot • 2.7k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Nicolas Rosewick 10k

Ram · Answer 1 · 2014-11-13

4

Entering edit mode

9.5 years ago

David Westergaard ★ 1.5k

Yes, ditch the p-values and show the effect size instead. (or at least both.) As you say, it is meaningless to compare P-values across comparisons, since the underlying data for the test will be different. Also remember that p-value is NOT proportional to effect size.

You can easily make a plot with lists over Y-axis and parameters over X-axis in ggplot2, and colour/size it according to effect size/p-value, respectively.

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by David Westergaard ★ 1.5k

0

Entering edit mode

Thanks. How do you introduce the effect size in the plot?

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Nicolas Rosewick 10k

2

Entering edit mode

require(ggplot2); require(plyr)
df <- data.frame(samples = c(rep('L1',3), rep('L2', 3), rep('L3', 3)), params=c('param1', 'param2', 'param3'), pval=runif(9), effect_size=rnorm(9, mean=10))
# Normalize within sample to make comparable across samples
df$effect_norm <- ddply(df, .(samples), function(x) {return(x[4]/max(x[4]))})$effect_size
# Plot stuff.
p <- ggplot(df) + geom_point(aes(x=samples, y=params, size=effect_norm)) + aes(colour=-log10(pval))

Not quite sure what you mean, but I attached some code showing the general idea of the plot.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by David Westergaard ★ 1.5k

0

Entering edit mode

Thanks. Pretty interesting. For me here list size are stable ( so L1 will have the same value for each of the param used ). So the point size will be the same for the same list independent of the param.

ADD REPLY • link 9.5 years ago by Nicolas Rosewick 10k

0

Entering edit mode

with "effect size" David meant effect size, not the length of your lists.

also, why don't you try to perform some kind of meta-analysis, to get a single p-value at the end?

or if you simply want to plot the p-values, I would suggest you to log-transform them and produce a dotplot

ADD REPLY • link 9.5 years ago by Martombo ★ 3.1k

Ram · Answer 2 · 2014-11-13

Adding to David Westergaard's excellent answer, the typical way to visualise these sorts of results in epidemiology and other disciplines that do a lot of among-studies comparisons is the Forest Plot. Basically, you'd plot the effect size and 95% CI for each parameter-estimate. The effect size, and the method of getting the CI from the p-value will depend on exactly what the studies are measuering. But you'd end up with something like this (using the data-frame from David Westergaard's answer):

df$ci  <- abs(rnorm(9, 0, df$effect_size))/2
forest_p <- ggplot(df, aes(samples,  effect_size))
forest_p + geom_point(size=3) +
           geom_pointrange(aes(ymin=effect_size - ci, ymax=effect_size + ci)) + 
           geom_hline(x=0) +
           facet_wrap(~params) +
           coord_flip()