Question: Combine Rna-Seq And Chip-Seq Data In A Heatmap In R
0
gravatar for mijnadresje1987
6.6 years ago by
mijnadresje19870 wrote:

Dear all,

I have 1 RNA-seq dataset and 1 CHIP-seq dataset and I would like to make a heatmap from this data. The only thing that they have in common are gene names. I can’t seem to figure out how to make 2 separate heatmaps of this data and then combine these in to one panel or how to make one heatmap of these datasets in which the columns of this heatmap are colored independently from each other (so both columns have a different color pallete) Any ideas?

Many thanks for your help

This is a sample of my dataset:

                   TAGS log2

ENSMUSG00000000103 39 -3.7508929

ENSMUSG00000000127 17 0.9289728

ENSMUSG00000000131 15 0.1310221

ENSMUSG00000000134 15 0.8215449

ENSMUSG00000000149 15 -0.5754766

ENSMUSG00000000157 97 -0.2849805

ENSMUSG = gene name

TAGS = data from chip-seq

log2 = data from rna-seq.

Both columns in the heatmap need to be in a different color pallete

R chip-seq rna-seq heatmap • 4.2k views
ADD COMMENTlink modified 6.6 years ago by Pavel Senin1.9k • written 6.6 years ago by mijnadresje19870

I have answered a similar question before in here Superimpose 2 Heatmaps with a code.

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Pavel Senin1.9k

I,ve tried your code and it worked fine untill i loaded my real big dataset of 3500 rows. With this dataset the pictures are not shown, any ideas how i can solve it without reducing the size of my dataset?

ADD REPLYlink written 6.6 years ago by mijnadresje19870

I see. Could you say, where it fails/hangs exactly? ggplot2 part?

ADD REPLYlink written 6.6 years ago by Pavel Senin1.9k
0
gravatar for Michael Dondrup
6.6 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

I think it should work easily, by combining the data into a single matrix before plotting the heatmap using e.g. the heatmap or heatmap.2 function. The most important thing to remember is though to scale the columns of the resulting matrix, because measurements with different variance and mean are combined.

If you wish to scale the rows as well, for that reason I would scale the rows of RNA-seq and CHiP-seq separately before combining them. This can be done using the

scale

function.

ADD COMMENTlink written 6.6 years ago by Michael Dondrup46k

I made the combined heatmap with the heatmap function but is it also possible to give the two columns both a different color pallete? So that chip-seq data is, for example, more red, orange and yellow and de rna-seq more blue and green.

ADD REPLYlink written 6.6 years ago by mijnadresje19870
1

You may be better served by building the graphics separately and combining with illustrator or inkscape after-the-fact. Alternatively, you could use base graphics to build this up in R.

ADD REPLYlink written 6.6 years ago by Sean Davis25k
0
gravatar for Pavel Senin
6.6 years ago by
Pavel Senin1.9k
Los Alamos, NM
Pavel Senin1.9k wrote:

Here you go with 3500 rows:

library(reshape)
library(ggplot2)
library(scales)
library(gridExtra)

data1 <- matrix(runif(3500),ncol=1,dimnames=list(c(rep(paste("gene",c(1:3500),sep=""))),c("set1")))
data1.m = melt(data1)
p1 <- ggplot(data1.m, aes(X2,X1)) + geom_tile(aes(fill=value), colour="white") +
scale_fill_gradient(low = "white", high = "steelblue") + ggtitle("data 1")

data2 <- matrix(runif(3500),ncol=1,dimnames=list(c(rep(paste("gene",c(1:3500),sep=""))),c("set2")))
data2.m = melt(data2)
p2 <- ggplot(data2.m, aes(X2,X1)) + geom_tile(aes(fill=value), colour="white") +
scale_fill_gradient(low = "white", high = "steelblue") + ggtitle("data 2")

d=cbind(rescale(data1+data2),data1,data2)
colnames(d)=c("combined","set1","set2")
d.m = melt(d)
p3 <- ggplot(d.m, aes(X2, X1)) + geom_tile(aes(fill=value), colour="white") +
scale_fill_gradient(low = "white", high = "steelblue") + ggtitle("data 1 + data 2")

print(arrangeGrob(p1, p2, p3, ncol=3))

all three maps

ADD COMMENTlink written 6.6 years ago by Pavel Senin1.9k

Dear seninp, When I use your exact code I do not get the blue horizontal lines (that represents the data) next to the grey area (which represents the gene names). R does not give an error message about anything

It looks like that the data are not represented in the image that I create with your 3500 lines of data.

If I use only a subset, i.e. 30 lines then it works....

Could you please let me know your thoughts?

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by mijnadresje19870

That is strange indeed. Honestly I do not know. I am not an expert in this sort of issues. What I could say though, that I run Ubuntu, 3.2.0-38-generic x86_64, R version 2.15.2 (2012-10-26) which has up-to date libraries. It is a Intel Xeon workstation with 8G memory. Above code runs in 20 sec producing the plot. What about your setup?

ADD REPLYlink written 6.6 years ago by Pavel Senin1.9k

I’m using a windows 7 machine with 8GB RAM I couldn’t see the picture in the panel but when I saved it as a pdf and zoomed in a lot the heatmaps became visible. My pictures is now very large en therefor useless, do you know a way to adjust the scale or something like that? The other thing in noticed was when I sorted the data (from high to low) the colors where not nicely ordered from red to green. Let say that red is the highest value en green the lowest, then there were also green colors between the highest values. Al the genes with the same value need to have the same color. Any ideas on these problems?

ADD REPLYlink written 6.6 years ago by mijnadresje19870

It is a fact that R has a pretty steep learning curve, but you'll probably need to do some work on your own to determine how best to get what you want. The pdf() function takes both width and height as arguments; feel free to change as you see fit, but with 3500 genes, you will need to have a VERY large pdf in order to distinguish individual genes. As for sorting, see the order() and sort() functions and consider using those prior to plotting. Finally, you may want to find a local R expert to help you ( http://r-users-group.meetup.com/ ).

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Sean Davis25k

Glad that it worked for you! I would suggest to try a Cairo Device, at least for me, it works better for PDF and PS: http://cran.r-project.org/package=cairoDevice. I think, that 3.5K of genes is quite a large number for looking at. Maybe partitioning those by values, or by gene names, or by functional groups, into multiple figures would help.

ADD REPLYlink written 6.6 years ago by Pavel Senin1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1972 users visited in the last hour