Scatter Plot For Correlations With Heatdensity
2
7
Entering edit mode
7.9 years ago
k.nirmalraman ★ 1.1k

Hi All,

I am trying to show some correlation between two samples and would like to do a scatter plot for the same.

I tried the following with ggplot2 but I am wondering if its possible to get the heat density as shown here:

qplot(x,y,data=data)+geom_abline(colour = "red", size = 1)+theme_bw()

I would like a scatter plot as shown below.

Can you help me acheive this? Thanks!!

visualization visualization r • 18k views
12
Entering edit mode
6.9 years ago
Jason ▴ 900

I believe the plots you asked to make were originally made using the R package LSD. I would check it out, it's a very easy package of commands to use that make great looking plots. Also, it can calculate different types of correlations for you too i.e. spearman and pearson.

n <- 10000
x <- rnorm(n)
y <- rnorm(n)
DF <- data.frame(x,y)
library(LSD)
heatscatter(DF[,1],DF[,2])


Edit: for some reason the image I generated isn't appearing (I copy and pasted it), any help from a mod? It looks like the images in this link.

1
Entering edit mode

Dragging and dropping an image (it looks like that's what you did) won't work. You need to post the image elsewhere and then just link to it.

0
Entering edit mode

I have a question, how can you add a label next to a heatscatter (which I am loving as my new fav graph) to show the viewer what the difference in density of points is between two colours? Something along the lines of this:

Red = 10-20 data points overlapping

Blue = 0-10 data points overlapping

Is it even possible, seems like something my PI would like to see, as would I.

1
Entering edit mode

I don't know exactly how to do that but maybe the contour function will help.  I asked a developer of LSD (Bjoern Schwalb), who actually posts a decent amount here, about what the values along the contour mean (see add.contour = TRUE) and he told me "the values shown are density estimates from a 2D Kernel Density Estimator function that is used internally (KDE2D)". For my presentations and recent manuscript I just made a color bar that goes from blue to red with the other colors in between and just said the red was high density and the blue is low density (most people are generally satisfied with that as long as your mention the sample size (i.e. n = 100). The publications I've seen that have used heat scatter have never specified exact numbers of how many data points overlap.

You may want to look into hexbin if finding the number of data points overlapping is really important. I think it's supposed to do a good job of performing that task: http://www.statmethods.net/graphs/scatterplot.html (it's under high density scatter plots)

This may also help for future LSD work if you hadn't seen it already: http://cran.fhcrc.org/web/packages/LSD/LSD.pdf

0
Entering edit mode

This is awesome! Wish I could save the plots as ggplot objects though

7
Entering edit mode
7.9 years ago
Irsan ★ 7.3k

# generare random data, swap this for yours :-)!
n <- 10000
x <- rnorm(n)
y <- rnorm(n)
DF <- data.frame(x,y)

# Calculate 2d density over a grid
library(MASS)
dens <- kde2d(x,y)

# create a new data frame of that 2d density grid
# (needs checking that I haven't stuffed up the order here of z?)
gr <- data.frame(with(dens, expand.grid(x,y)), as.vector(dens$z)) names(gr) <- c("xgr", "ygr", "zgr") # Fit a model mod <- loess(zgr~xgr*ygr, data=gr) # Apply the model to the original data to estimate density at that point DF$pointdens <- predict(mod, newdata=data.frame(xgr=x, ygr=y))

# Draw plot
library(ggplot2)
ggplot(DF, aes(x=x,y=y, color=pointdens)) + geom_point() + scale_colour_gradientn(colours = rainbow(5)) + theme_bw()


0
Entering edit mode

@ Irsan: This is what I would like to currently generate for my gene expression data of dimension x=4X15000 and y= 4X15000 to show the correlation between all gene pairs in x and y. Could you please suggest how I should modify my data to obtain a scatterplot of gene expression based on heat density.

0
Entering edit mode

Yes, how does your data look like now?

0
Entering edit mode

My data is initially two dataframes of dimension 15000 X 4 each where the rows are the genes and the columns are the samples. So for these two dataframes, I would like to find the scatterplot of gene correlation density.

0
Entering edit mode

Ok, clear. I will come back to you end next week. Leaving for holiday now

0
Entering edit mode

this is the same question is Scatterplots Showing Correlation Between Gene Pairs right?