Volcano plot: Using Multiple results
2.5 years ago

Hello, I am trying to make volcano plot of the multiple microarray results data in the single plot itself and annotating them by different colors. I know how to generate the Volcano plot for single result, but the same logic I can't apply for multiple results. I googled it but there also i didn't get any satisfactory results Following is the example dataset,

Gene                Fold      pvalue          Fold      pvalue
LUZP1             -7.57373    0.0085       3.66623      0.0075
FAM71F1           -6.31787    0.00865    -6.23586        0.00013
MBD4              -5.81446    0.0087     2.68795        0.00264
HOXD13             4.85073    0.0234      -4.2358        0.0035
DNAJC2             -2.75493    0.0314     1.23585       0.045
CHD4                -2.49614    0.057     6.26588       0.0225

My question how will it be possible to make such volcanoplot. Any suggestions or guidance in this regard is deeply appreciated.

2.5 years ago
shawn.w.foley ★ 1.3k

An alternative solution would be to use ggplot2. If you have results from multiple samples in df1, df2, and df3, then:

library(ggplot2)
df1$sample <- 'sample1' df1$significant <- abs(df1$Fold) > 1 & df1$FDR < 0.05
df2$sample <- 'sample2' df2$significant <- abs(df2$Fold) > 1 & df2$FDR < 0.05
df3$sample <- 'sample3' df3$significant <- abs(df3$Fold) > 1 & df3$FDR < 0.05

df.combine <- rbind(df1,df2,df3)
ggplot(df.combine,aes(x=Fold,y=-log10(FDR),col=significant,shape=sample) + geom_point())

This will result in a Volcano plot graphing Fold change versus -log10(FDR), with each point colored by whether it's a significant difference (defined by both absolute value of log2FC and significance) and with a shape corresponding to the sample.

That being said, I think ATpoint makes a good suggestion with multiple different plots on the same page, I'd worry about overplotting.

Yes this can be a good method, Thank you @shawn.w.foley

2.5 years ago
ATpoint 57k

Simply make the volcano with the first dataset using the standard plot command and then use points specifying a different color to add extra datapoints to the existing plot corresponding to the other studies. Later use legend to make a proper legend matching colors with the study name.

plot(study1$logFC, -log10(study1$pvalue), col="black")
points(study2$logFC, -log10(study2$pvalue), col="red")
(and so on...)

Can probably be done with a simple for loop. Make sure you check the data range for x and y-axis beforehand so that no added points go beyond the limits from the first plot. Still, will probably get a bit messy adding so many data points. Maybe independent plots on the same page par(mfrow=c(2,2)) to get four plots on one page might be better.

Thanks a lot @ATpoint. It worked. and Since I am working on very small no. of genes, plot is not that messy. I appreciate your suggestions.