Question

make volcano plot from multiple dataframes with same genes and different foldchanges

0

Entering edit mode

15 months ago

Chironex ▴ 40

My question is probably not new, but I haven't found fully satisfactory answers. I have multiple dataframes with the same columns, each one represents a cluster, so the genes are the same but they are expressed differently so with different fold changes and pvalues. I would like to create a volcano plot that combines all these dataframes and plots them together, coloring the genes differently in order to understand which cluster they belong to. the point is that the genes will not be unique for just one dataframe but will be often repeated. is it possible to do this?

r • 1.4k views

ADD COMMENT • link updated 15 months ago by seidel 11k • written 15 months ago by Chironex ▴ 40

1

Entering edit mode

15 months ago

Ming Tommy Tang ★ 3.9k

you will need to prefix the gene name with the cluster id, and you can then just concatenate all dataframe and plot as usual.

ADD COMMENT • link 15 months ago by Ming Tommy Tang ★ 3.9k

score 2 · Accepted Answer · 2022-12-30

It's quite possible to do what you want. The base plot functions in R make overplotting easy (adding sets of points to a plot layer by layer). If your data frames are in a list, you can use the apply family of functions. Here's a not pretty example using mapply() which can iterate through two things at once (a list of data frames, and a vector of colors).

# create a list of data frames 
# each with uniquely skewed data
df_list <- lapply(as.list(1:5), function(x){
  x <- rnorm(50) + rnorm(1,0,2)
  y <- abs(rnorm(50))
  df <- data.frame(ex=x, sig=y)
  rownames(df) <- paste0("g",1:50)
  return(df)
})

# create a blank plot
plot(1,1, type="n", xlim=c(-4,4), ylim=c(0,3), xlab="logFC", ylab="sig")

# create some colors
plotcolors <- rainbow(5)
# iterate through df_list and plot the points for each data frame
mapply(function(df,p){
  points(df$ex, df$sig, col=p, pch=19)  
}, df_list, plotcolors)

The first section uses apply to create a list of 5 dataframes, each with 2 columns containing 50 data points vaguely resembling volcano plot-like data skewed in a given direction. In this case, each data frame has the same set of genes (but it doesn't matter what the gene names are). The second part uses mapply() to loop through the list of dataframes, and the vector of plot colors, to draw data points on the plot.

Of course, if you had a few data frames and wanted to simply plot them one by one (no loop), it's straightforward:

# plot all the data
plot(logFC, significance, col="grey")
# plot your first df cluster: df1
points(df1$logFC, df1$significance, col=yourFavoriteColor1)
# plot your second df cluster: df2
points(df2$logFC, df2$significance, col=yourFavoriteColor2)
# etc.