Question: Scatter Plot For Correlations With Heatdensity
7
gravatar for k.nirmalraman
5.5 years ago by
k.nirmalraman960
Germany
k.nirmalraman960 wrote:

Hi All,

I am trying to show some correlation between two samples and would like to do a scatter plot for the same.

I tried the following with ggplot2 but I am wondering if its possible to get the heat density as shown here:

qplot(x,y,data=data)+geom_abline(colour = "red", size = 1)+theme_bw() what I got

I would like a scatter plot as shown below.

Correlation PLot

Can you help me acheive this? Thanks!!

R visualization • 15k views
ADD COMMENTlink modified 4.5 years ago by Jason870 • written 5.5 years ago by k.nirmalraman960
12
gravatar for Jason
4.5 years ago by
Jason870
United States
Jason870 wrote:

I believe the plots you asked to make were originally made using the R package LSD. I would check it out, it's a very easy package of commands to use that make great looking plots. Also, it can calculate different types of correlations for you too i.e. spearman and pearson. 

 

n <- 10000
x <- rnorm(n)
y <- rnorm(n)
DF <- data.frame(x,y)
library(LSD)
heatscatter(DF[,1],DF[,2])

 

Edit: for some reason the image I generated isn't appearing (I copy and pasted it), any help from a mod? It looks like the images in this link.

ADD COMMENTlink modified 4.5 years ago by Devon Ryan86k • written 4.5 years ago by Jason870
1

Dragging and dropping an image (it looks like that's what you did) won't work. You need to post the image elsewhere and then just link to it.
 

ADD REPLYlink written 4.5 years ago by Devon Ryan86k

I have a question, how can you add a label next to a heatscatter (which I am loving as my new fav graph) to show the viewer what the difference in density of points is between two colours? Something along the lines of this:

Red = 10-20 data points overlapping

Blue = 0-10 data points overlapping

 

Is it even possible, seems like something my PI would like to see, as would I. 

ADD REPLYlink written 4.1 years ago by james.lloyd80
1

I don't know exactly how to do that but maybe the contour function will help.  I asked a developer of LSD (Bjoern Schwalb), who actually posts a decent amount here, about what the values along the contour mean (see add.contour = TRUE) and he told me "the values shown are density estimates from a 2D Kernel Density Estimator function that is used internally (KDE2D)". For my presentations and recent manuscript I just made a color bar that goes from blue to red with the other colors in between and just said the red was high density and the blue is low density (most people are generally satisfied with that as long as your mention the sample size (i.e. n = 100). The publications I've seen that have used heat scatter have never specified exact numbers of how many data points overlap.  

You may want to look into hexbin if finding the number of data points overlapping is really important. I think it's supposed to do a good job of performing that task: http://www.statmethods.net/graphs/scatterplot.html (it's under high density scatter plots)

This may also help for future LSD work if you hadn't seen it already: http://cran.fhcrc.org/web/packages/LSD/LSD.pdf

ADD REPLYlink written 4.1 years ago by Jason870

This is awesome! Wish I could save the plots as ggplot objects though

ADD REPLYlink written 2.6 years ago by sviatoslav.kendall470
7
gravatar for Irsan
5.5 years ago by
Irsan6.8k
Amsterdam
Irsan6.8k wrote:

Adapted from stackoverflow

# generare random data, swap this for yours :-)!
n <- 10000
x <- rnorm(n)
y <- rnorm(n)
DF <- data.frame(x,y)

# Calculate 2d density over a grid
library(MASS)
dens <- kde2d(x,y)

# create a new data frame of that 2d density grid
# (needs checking that I haven't stuffed up the order here of z?)
gr <- data.frame(with(dens, expand.grid(x,y)), as.vector(dens$z))
names(gr) <- c("xgr", "ygr", "zgr")

# Fit a model
mod <- loess(zgr~xgr*ygr, data=gr)

# Apply the model to the original data to estimate density at that point
DF$pointdens <- predict(mod, newdata=data.frame(xgr=x, ygr=y))

# Draw plot
library(ggplot2)
ggplot(DF, aes(x=x,y=y, color=pointdens)) + geom_point() + scale_colour_gradientn(colours = rainbow(5)) + theme_bw()

Scatterplot with points coloured according to the amount of points in that area

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Irsan6.8k

@ Irsan: This is what I would like to currently generate for my gene expression data of dimension x=4X15000 and y= 4X15000 to show the correlation between all gene pairs in x and y. Could you please suggest how I should modify my data to obtain a scatterplot of gene expression based on heat density.

ADD REPLYlink written 5.1 years ago by spaul850510

Yes, how does your data look like now?

ADD REPLYlink written 5.1 years ago by Irsan6.8k

My data is initially two dataframes of dimension 15000 X 4 each where the rows are the genes and the columns are the samples. So for these two dataframes, I would like to find the scatterplot of gene correlation density.

ADD REPLYlink written 5.1 years ago by spaul850510

Ok, clear. I will come back to you end next week. Leaving for holiday now

ADD REPLYlink written 5.1 years ago by Irsan6.8k

this is the same question is Scatterplots Showing Correlation Between Gene Pairs right?

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Irsan6.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1643 users visited in the last hour