Question: R code to plot heatmap/average line plots from Deeptools computeMatrix
gravatar for 94133
2.5 years ago by
9413310 wrote:

Deeptools is an incredible package, but I want more plotting versatility.

Anyone kind enough to share workflow/code for plotting computeMatrix heatmaps and average plots in R (ggplot, pheatmap etc.)?

Other suggestions for plotting from .bw coverage files and .bed feature files?

Thanks a lot!!!

sequencing chip-seq R • 3.7k views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by 9413310

I think deeptools have pretty nice adjustments to generate very good heatmaps/profile plots. What do you think is missing?

Check this: Visualization of ChIP-seq data using Heatmaps (Updated: 06/10/16)

ADD REPLYlink written 2.5 years ago by venu6.7k

I agree that the plots are nice and an easy way to visualize the data, but don't like the aesthetic for publication quality figures. I think there are way more attractive plots being published using other packages and want to go that route (i.e. matplotlib, pheatmap, ggplot etc).

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by 9413310

In regards to heatmaps I can assure you that it will require a lot of fiddling around to achieve similarly good-looking results in R (which is something you've realized already, hence the question, I assume). EDIT: I have never had enough patience so far to go further than what Devon has supplied. And I much prefer R for graphics.

The aspects that I tend to want to change about the deepTools heatmaps are mostly concerning the labels. Those can more easily be changed by going the python route (i.e., using the original source code and changing the aspects you feel need change).

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Friederike6.5k

You are correct ;)

Anyone have a better solution?

ADD REPLYlink written 2.5 years ago by 9413310

@Kevin has some nice links here: How can I generate Heat Map with dendograms, and PCA analysis in "R Programming" for this type of data?

ADD REPLYlink written 2.5 years ago by genomax92k

have you looked through the links venu provided?

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Friederike6.5k

FYI, matplotlib is exactly what's used to create plots by deepTools.

ADD REPLYlink written 2.5 years ago by Devon Ryan97k

can you specify which aspects you don't find attractive enough?

ADD REPLYlink written 2.5 years ago by Friederike6.5k

Thanks so much for your help everyone! I'll definitely start by tweaking the Python code in deepTools and dig into the R code when I have time.

ADD REPLYlink written 2.5 years ago by 9413310
gravatar for Devon Ryan
2.5 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

I'm not sure it'd make sense to tidy a matrix of that size for use with ggplot2. Below is how to read a simple file produced by computeMatrix in R (the example file I'm using below can be found in the deepTools source code under deeptools/test/test_data/):

m = read.delim("computeMatrixOperations.mat.gz", skip=2, header=F)
m = as.matrix(m[,-c(1:6)])

The resulting matrix is exactly what's plotted in plotHeatmap, though you can also use pheatmap() if you really want:

pheatmap(m, cluster_rows=F, cluster_cols=F)  # This is a useless heatmap, I just use it for testing

Note that there are likes NAs in the data, which cause a lot of things to have issues during clustering.

One big caveat to all of this is that you don't see delineations by region groups and samples. That information is stored in the header:

> h = scan("computeMatrixOperations.mat.gz", n=1, sep="\n", what=character())
Read 1 item
> h
[1] "@{\"verbose\":true,\"scale\":1,\"skip zeros\":false,\"nan after end\":false,\"sort using\":\"mean\",\"unscaled 5 prime\":0,\"body\":1000,\"sample_labels\":[\"SRR648667.forward\",\"SRR648668.forward\",\"SRR648669.forward\",\"SRR648670.forward\",\"SRR648667.reverse\",\"SRR648668.reverse\",\"SRR648669.reverse\",\"SRR648670.reverse\"],\"downstream\":0,\"unscaled 3 prime\":0,\"group_labels\":[\"genes\"],\"bin size\":10,\"upstream\":0,\"group_boundaries\":[0,196],\"sample_boundaries\":[0,100,200,300,400,500,600,700,800],\"missing data as zero\":false,\"ref point\":null,\"min threshold\":null,\"sort regions\":\"no\",\"proc number\":20,\"bin avg type\":\"mean\",\"max threshold\":null}"

That's a json string, which I don't think base R has a function to parse. The important part is that the bit after sample_boundaries denotes the beginning and end of each sample (the labels are in sample_labels) and group_boundaries/group_labels does the same for groups of regions. So you can subset matrices in the normal way in R accordingly.

For computing the equivalent of plotProfile's output, you can simply apply(m, 2, function(x) mean(x, na.rm=T) (be sure to handle NA values!).

I usually prefer creating plots in R, but if it'd be a good bit of work to just reproduce a moderately complicated equivalent to plotHeatmap in R, so I would encourage you to take the advice from Friederike and tweak the python code in deepTools instead.

ADD COMMENTlink written 2.5 years ago by Devon Ryan97k

Quick note: shouldn't skip=2 be skip=1? As far as I can tell from my data, the second line contains the first genomic region and thus should be included.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by A. Domingues2.4k

Yes, I think you're correct, it should be skip=1 rather than skip=2. Good catch!

ADD REPLYlink written 2.2 years ago by Devon Ryan97k

Apropos, the json parsing is very easy (not base R):

h <- scan(mat_file, n=1, sep="\n", what=character())
params <- fromJSON(txt=gsub("@", "", h))
ADD REPLYlink written 2.2 years ago by A. Domingues2.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2280 users visited in the last hour