R code to plot heatmap/average line plots from Deeptools computeMatrix
1
2
Entering edit mode
5.9 years ago
94133 ▴ 30

Deeptools is an incredible package, but I want more plotting versatility.

Anyone kind enough to share workflow/code for plotting computeMatrix heatmaps and average plots in R (ggplot, pheatmap etc.)?

Other suggestions for plotting from .bw coverage files and .bed feature files?

Thanks a lot!!!

ChIP-Seq sequencing R • 8.4k views
ADD COMMENT
1
Entering edit mode

I think deeptools have pretty nice adjustments to generate very good heatmaps/profile plots. What do you think is missing?

Check this: Visualization of ChIP-seq data using Heatmaps (Updated: 06/10/16)

ADD REPLY
0
Entering edit mode

I agree that the plots are nice and an easy way to visualize the data, but don't like the aesthetic for publication quality figures. I think there are way more attractive plots being published using other packages and want to go that route (i.e. matplotlib, pheatmap, ggplot etc).

ADD REPLY
0
Entering edit mode

In regards to heatmaps I can assure you that it will require a lot of fiddling around to achieve similarly good-looking results in R (which is something you've realized already, hence the question, I assume). EDIT: I have never had enough patience so far to go further than what Devon has supplied. And I much prefer R for graphics.

The aspects that I tend to want to change about the deepTools heatmaps are mostly concerning the labels. Those can more easily be changed by going the python route (i.e., using the original source code and changing the aspects you feel need change).

ADD REPLY
0
Entering edit mode

You are correct ;)

Anyone have a better solution?

ADD REPLY
0
Entering edit mode

have you looked through the links venu provided?

ADD REPLY
0
Entering edit mode

FYI, matplotlib is exactly what's used to create plots by deepTools.

ADD REPLY
0
Entering edit mode

can you specify which aspects you don't find attractive enough?

ADD REPLY
1
Entering edit mode

Thanks so much for your help everyone! I'll definitely start by tweaking the Python code in deepTools and dig into the R code when I have time.

ADD REPLY
6
Entering edit mode
5.9 years ago

I'm not sure it'd make sense to tidy a matrix of that size for use with ggplot2. Below is how to read a simple file produced by computeMatrix in R (the example file I'm using below can be found in the deepTools source code under deeptools/test/test_data/):

m = read.delim("computeMatrixOperations.mat.gz", skip=2, header=F)
m = as.matrix(m[,-c(1:6)])

The resulting matrix is exactly what's plotted in plotHeatmap, though you can also use pheatmap() if you really want:

pheatmap(m, cluster_rows=F, cluster_cols=F)  # This is a useless heatmap, I just use it for testing

Note that there are likes NAs in the data, which cause a lot of things to have issues during clustering.

One big caveat to all of this is that you don't see delineations by region groups and samples. That information is stored in the header:

> h = scan("computeMatrixOperations.mat.gz", n=1, sep="\n", what=character())
Read 1 item
> h
[1] "@{\"verbose\":true,\"scale\":1,\"skip zeros\":false,\"nan after end\":false,\"sort using\":\"mean\",\"unscaled 5 prime\":0,\"body\":1000,\"sample_labels\":[\"SRR648667.forward\",\"SRR648668.forward\",\"SRR648669.forward\",\"SRR648670.forward\",\"SRR648667.reverse\",\"SRR648668.reverse\",\"SRR648669.reverse\",\"SRR648670.reverse\"],\"downstream\":0,\"unscaled 3 prime\":0,\"group_labels\":[\"genes\"],\"bin size\":10,\"upstream\":0,\"group_boundaries\":[0,196],\"sample_boundaries\":[0,100,200,300,400,500,600,700,800],\"missing data as zero\":false,\"ref point\":null,\"min threshold\":null,\"sort regions\":\"no\",\"proc number\":20,\"bin avg type\":\"mean\",\"max threshold\":null}"

That's a json string, which I don't think base R has a function to parse. The important part is that the bit after sample_boundaries denotes the beginning and end of each sample (the labels are in sample_labels) and group_boundaries/group_labels does the same for groups of regions. So you can subset matrices in the normal way in R accordingly.

For computing the equivalent of plotProfile's output, you can simply apply(m, 2, function(x) mean(x, na.rm=T) (be sure to handle NA values!).

I usually prefer creating plots in R, but if it'd be a good bit of work to just reproduce a moderately complicated equivalent to plotHeatmap in R, so I would encourage you to take the advice from Friederike and tweak the python code in deepTools instead.

ADD COMMENT
0
Entering edit mode

Quick note: shouldn't skip=2 be skip=1? As far as I can tell from my data, the second line contains the first genomic region and thus should be included.

ADD REPLY
0
Entering edit mode

Yes, I think you're correct, it should be skip=1 rather than skip=2. Good catch!

ADD REPLY
1
Entering edit mode

Apropos, the json parsing is very easy (not base R):

library("jsonlite")
h <- scan(mat_file, n=1, sep="\n", what=character())
params <- fromJSON(txt=gsub("@", "", h))
ADD REPLY

Login before adding your answer.

Traffic: 1467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6