Visualizations of ChIP-Seq data using Heatmaps

Question

Forum:Visualization of ChIP-seq data using Heatmaps (Updated: 06/10/16)

90

Entering edit mode

8.1 years ago

Sinji ★ 3.2k

Visualizations of ChIP-Seq data using Heatmaps

Updated 3/9/16 (commit: "Included a very simple and preliminary image for the genomation R package")
Updated 3/10/16 (commit: "Included a very simple and preliminary image for the SeqPlots R package")
Updated 6/10/16 (commit: "Included EaSeq, missing heatmap image.")

Introduction

I am rather new to the bioinformatics field, and much of my current work has been on the analysis and visualization of individual ChIP-Seq data or in combination with other sequencing data. A large part of this is the generation of heatmaps that accurately represent our the data in defined regions, yet are attractive and require little to no altering for publishing purposes.

Problem

The visualization of data is important. Various different tools have been created to allow the visualization of said data. But no two tools are the same regardless of the same input data / format. Why is this?

What other tools are you aware of for producing heatmaps?

Tools

Deeptools: https://github.com/fidelram/deepTools
Ngsplot: https://github.com/shenlab-sinai/ngsplot
ChAsE: http://chase.cs.univie.ac.at/overview
HOMER: http://homer.salk.edu/homer/
genomation: https://bioconductor.org/packages/devel/bioc/html/genomation.html
SeqPlots: https://github.com/Przemol/seqplots
EaSeq: http://easeq.net/

Results

The following section will consist of the heatmaps produced by each program to identify aesthetic differences.

The heatmaps will consist of Pol II N20 ChIP-Seq data overlapped to a dataset of 20,345 Gencode protein_coding genes looking specifically the TSS and a -2000/+2000 region surrounding the TSS. A binsize of 25 will be used for all of the tools (when applicable) unless otherwise stated to keep test times to a minimum. I will attempt to keep the images as close together in size as possible (I do not know if this bares any significance in the outlook of a heatmap). For sorting I will be using mean signal intensity, for those tools that do not have a method for computing the mean for each row in a matrix, I will be using the max method. I will attempt to use bigWig files whenever possible to keep results due to error in file type processing to a minimum, but this is completely out of my control.

It is important to note some of the tools appear to require pre-processing of the dataset, while others accept bedfiles easily and give the option of choosing a region to plot.

Deeptools

Deeptools is my main visualization tool. It is easy to use, powerful, accurate and requires little pre-processing of my gencode dataset. While deeptools has many uses, I will be specifically focusing on heatmap creation, and nothing outside of that. While being one of my favorite visualization tools, the heatmaps it produces are not often as attractive as those produced by other programs (see HOMER results) though it is still my first choice in any preliminary analysis of data.

Deeptools has two steps: the creation of a matrix file, and the actual generation of the heatmap. The computeMatrix step has two options, a reference-point centered option that we will use for this example (good for creating a heatmap based on the start, end or center of a dataset) and a scale-regions option that is good for heatmap creation of variable length datasets (signal at exons, etc).

INPUT: BigWig files of ChIP-seq marks, and feature dataset in BED format.

computeMatrix reference-point -S $Pol_II -R $GENCODE.v19 -a 2000 -b 2000 -out $OUTPUT -bs 25 -p 6 --missingDataAsZero

plotHeatmap -m $INPUT_CM --colorList white black --heatmapHeight 25 --heatmapWidth 3 --samplesLabel "Pol II" -out $OUTPUT.png --whatToShow 'heatmap and colorbar' --sortUsing max

I've used mostly default settings for the deeptools output labels since we are working at the TSS. I also omitted the profile plot that is included in a default deeptools heatmap.

Heatmap generated by Deeptools

NGSPLOT

The more often I attempt to use NGSPLOT the more I am disappointed with it. Unlike deeptools it doesn't have a very active support base, so any sort of problems you have with the tool you will have to solve yourself. It is also often difficult to work with, I've run into many problems trying to use a custom dataset that were just not worth the troubleshooting. The tool also requires you to pre-process your data so that NGSPLOT knows we are only interested in the TSS and not the whole gene. This pre-processing was done using a combination of Awk / R and codes can be provided if interested though it is relatively simple for most of you I assume.

However, for those who are interested in a easy to use tool that produces attractive heatmaps at the TSS easily, and requires little to no altering in photoshop then NGSPLOT is fine.

INPUT: BAM files of ChIP-seq marks (must create a config file if using more than one mark), and a feature dataset (there are custom RefSeq and Ensembl datasets already built into the tool) in BED format.

ngs.plot.r -G hg19 -R bed -C $INPUT -O $OUTPUT.pdf -E $GENCODE.v19 -L 2000 -CS 25 -WD 6 -CO black -P 6 -GO max

Again I generally used default settings for labels.

Heatmap generated by NGSPLOT

ChAsE

Chase is a rather new tool that I've only recently started looking into. Originally I did most of my analysis with either deeptools or ngsplot, but came across this tool while searching through some bioinformatics blogs.

The tool contains a GUI, which makes it quite nice for those that are afraid or new to the command line. Of course this gives the program its limitations, but i've actually grown quite fond of the attractive heatmaps and easy to use program. It's powerful enough for at least the basic analysis that I conduct on a day to day basis, however my biggest complaint is that the documentation is HORRIBLE. Seriously. I've found a few videos on the tool, but they are just horrid.

That being said, i'd use this program over NGSPLOT anyday, and while it's not as powerful as deeptools (few visualization programs probably are), the heatmaps it generates are much more aesthetically pleasing to look at. The resulting image must be input into a image editor in order to fill in labels since the program's default generated labeling is ... almost as atrocious as it's documentation.

INPUT: BigWig files of ChIP-seq marks, and a feature dataset. This tool requires that you pre-process your features before inputting.

Heatmap generated by ChAsE

HOMER

HOMER is a software suite beast. It does anything from visualization to annotation of peaks and more. It has some of the most expansive and well kept documentation of any program i've seen and is pretty easy to use.

I have e-mailed the support team for HOMER a time or two and always recieved relatively prompt responses. However, because the HOMER software is so expansive and is meant to do a lot of things at once, the actual heatmap generation portion is actually very limiting.

First off, it doesn't allow for a un-uniform bp length from the point of interest (for example, instead of -2k/+2k from the TSS, HOMER doesn't allow for -500/+2k from the TSS. Also HOMER itself does not generate heatmaps, it only produces the matrix file needed to produce these heatmaps. You must use Cluster 3.0 and Java TreeViewer in order to visualize and generate the final heatmaps.

The output is a very attractive (probably most out of all the programs mentioned thus far) heatmap, that requires much labeling, but does not seem to be visually accurate, though aesthetically pleasing. There is also no color bar support.

INPUT: Wiggle / Bedgraphs (bedgraphs in this case) files of ChIP-seq marks, and a feature dataset (also has a simple TSS option that can be used in its place). This tool requires that you pre-process your features before inputting.

annotatePeaks.pl $GENCODE.v19 hg19 -gtf $REFERENCE_GTF -size 4000 -hist 25 -ghist -bedGraph $POL-BEDGRAPH

Resulting txt file must be input into the cluster 3.0 program to output a cdt file, and the cdt file must then be run through a custom script to sort by enrichment signal.

Heatmap generated by HOMER

Genomation

Genomation is a R package that can be installed via the Bioconductor 'suite'. The package assumes that you have an intermediate knowledge of R and is thus one of the more difficult to use heatmap visualization tools that has been/will be mentioned in this post. The tool itself is very flexible, allowing you to generate a matrix and then generating a heatmap or metagene plot via that matrix, as well as includes clustering and annotation features. It lacks some of the simpleness of programs such as deeptools and NGSPLOT, but it makes up for it with the ability to generate publish-ready figures and since it is an R package, you are free to use many of R's plotting options to make an even more attractive heatmap.

The package has two steps similar to Deeptools: 1) a matrix generating step and the 2) actual heatmap creation step.

Genomation offers two options for the matrix generating step:

ScoreMatrix - which produces a base-pair resolution matrix of scores for a feature of interest, the only caveat is that all the features of interest must be of equal lengths.
ScoreMatrixBin - this option will first bin each feature of interest to a equal number of bins and then calculates the summary matrix for scores of each bin. The matrix only supports mean, max, and min enrichment modes, with mean being the default.

The heatmap produced uses the ScoreMatrixBin option with 160 bins and a 4000 bp window with the TSS as the center in order to keep in line with the 25 bp bin of the other tools.

Due to my limited experience using R, I was not able to properly alter the heatmap colors to keep it in line with the others.

INPUT: BigWig / BAM files of your ChIP-seq marks and a feature file loaded into R using the readGeneric function of Genomation (you will need to pre-process your data).

Heatmap generated by Genomation

SeqPlots

SeqPlots is an R package GUI tool that can be used for the visualization of track signals (BigWigs) and sequence motifs. The tool is quite sophisticated despite it's GUI, but is thus still limited in certain aspects. However because it is web browser based tool it is easily used from any OS platform.

The tool is simple to use and the heatmaps produced are attractive and similar to ngsplot in aesthetics. The tool does not assume any R knowledge, thus making it easy to use for those without a strong Unix / Command line / R background.

It is also important to note that your feature dataset does NOT require any pre-processing (or little) as the program allows you to choose from several regions of your dataset (the start point, mid point, and end point) similar to Deeptools.

However, because the R package is GUI based the options for several things are limiting. Easily the most detrimental aspect of the tool is that reading in feature files is difficult. It takes only a specific formatted bed file that must end with the .bed extension or otherwise any upload will fail. Therefore feature files must be in a BED6 format prior to upload. The tool also only currently has a couple of sorting options including mean, and max methods, leaving much to be desired. Reference files are limited to those found in R annotation packages, and must be created if not already available in these packages which could be difficult to those not familiar with R. The program also lacks a nice custom color editor that would allow you to customize heatmaps, instead forcing you to to choose from a (quite large) variety of R custom color sets.

INPUT: BigWig files of your ChIP-seq marks and a feature file that does not have to be pre-processed to your regions of interest.

Heatmap generated by SeqPlots

EaSeq

EaSeq is a windows-based visualization and data exploration program for a variety of sequencing datasets including ChIP, RNA-seq, and DIP-seq. It is easily one of the more comprehensive GUI based heatmap softwares I have worked with, allowing for all manners of customization and sorting.

The heatmaps produced by EaSeq are attractive, more so than deepTools, but less so than HOMER. The only problem with this software is that it is limited in scope to Windows based platforms. As easy as it may be to launch a virtual machine of Windows, it adds extra complexity to a relatively simple task.

The software is a perfect example of how a well designed GUI can be as customizable as a command line program. EaSeq also allows for many other visualizations, including but not limited to boxplots, bar charts, pie graphs, metagene plots, and signal tracks.

INPUT: Most data types that contain coordinate positions (BAM, BED) are accepted as 'datasets', regions of interest are accepted as 'regionsets' and do not need to be pre-processed since the program allows for choosing between start, center, and end of region sets for heatmap analysis. Interestingly the program allows downloading of various genome gene datasets, which is very helpful to newer individuals.

Conclusion

Many programs exist for the visualization of ChIP-seq data. The output of programs differ highly between them though in theory they should all be similar in appearance. What causes such variety between programs? What other programs do you use to visual your data?

I am interested in exploring other data visualization tools, any suggestions will be documented and posted here in case anyone ever finds it useful.

visualization ChIP-Seq homer ngsplot deeptools • 50k views

ADD COMMENT • link updated 12 months ago by Ram 43k • written 8.1 years ago by Sinji ★ 3.2k

4

Entering edit mode

deeptools developer here: How can we improve the produced heatmaps? We are happy to get feedback.

ADD REPLY • link 8.1 years ago by Fidel ★ 2.0k

0

Entering edit mode

Do you have an e-mail address where I could send a detailed response to? I suppose I could just use the Google Groups forum if you'd prefer.

One thing I would like to suggest right now that really irks me, and i'm not sure if there are already plans for the implementation in the future, is there is no way to give individual --colorLists to separate heatmaps. I think this function already exists in the plotProfile function.

Right now if I have to generate 8 separate heatmaps for 8 separate markers, I usually end up having to run computeMatrix and plotHeatmap 8 times to get 8 appropriately colored heatmaps.

ADD REPLY • link 8.1 years ago by Sinji ★ 3.2k

5

Entering edit mode

We have added the multiple color option to the upcoming release! Right now is on the develop branch in github. multiple color heatmaps

The boxes around the heatmaps that divide a clustering, can also be removed.

multiple colors no box

Some examples can be seen in the documentation

ADD REPLY • link 8.0 years ago by Fidel ★ 2.0k

0

Entering edit mode

I love how deeptools is evolving day by day. Such a useful resource. Great!!

ADD REPLY • link 8.0 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Wow. I applaud you and your team for all your hardwork! Deeptools has easily become my 'go-to' when someone asks me general 'how do I visualize' questions for most sequencing data.

Keep it up, this is amazing.

ADD REPLY • link 8.0 years ago by Sinji ★ 3.2k

0

Entering edit mode

How did you manage to make the color scale for 0-1 for every heatmap. I tried --zMin and --zMax, and i got the impression that rpkm value (from bamCoverage) was not converted to the range of provided --zMin and --zMax. Rather it was plotting only the range of provided -zMin and --zMax. So, if a given plot has --yMin 0 and --yMAx 10, and if i give --zMin 0 and --zMax 1, it will plot values only between 0 and 1 in heatmap.

ADD REPLY • link 2.7 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

Our email addresses are listed here, though you can also use either the github repository issue tracker or the google group.

Fidel has an open enhancement request for multiple colorlists. He and I are of somewhat separate minds on that one, since we'd then want to allow different scales and that may or may not be a good idea, depending on the dataset (I'm mostly hesitant about this since it's a convenient way for the naive user to create heatmaps that are easy to misinterpret).

ADD REPLY • link 8.1 years ago by Devon Ryan 104k

1

Entering edit mode

Genomation bioconductor package is nice too. Does clustering and manual annotation as well.

ADD REPLY • link 8.1 years ago by poisonAlien ★ 3.2k

1

Entering edit mode

EaSeq looks like a promising GUI http://www.nature.com/nsmb/journal/vaop/ncurrent/full/nsmb.3180.html

ADD REPLY • link 8.1 years ago by Ming Tommy Tang ★ 3.9k

0

Entering edit mode

You're right, I'd have loved to try it out. The windows only platform is .. interesting .. and slightly frustrating.

ADD REPLY • link 8.1 years ago by Sinji ★ 3.2k

0

Entering edit mode

The windows only platform

Really? I had a tab with the paper open to read it, but that sort of put me off. I might give it a go nonetheless for the benefit of my Windows lab colleagues.

ADD REPLY • link 8.1 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

Yea, here's a direct quote from the actual tool page:

It is made for Windows, and it therefore requires a virtual machine to run on Mac OS or Linux. We have tested it succesfully using Parallels for Mac.

It's sad because the tool looked good and the paper was nice. But sounds like more work than what it's worth with the other tools already mentioned being available.

ADD REPLY • link 8.1 years ago by Sinji ★ 3.2k

0

Entering edit mode

well, that's why I am not testing it for now :) potentially helpful for wet lab guys though.

ADD REPLY • link 8.1 years ago by Ming Tommy Tang ★ 3.9k

0

Entering edit mode

@Sinji I know It has been a while but I saw that your comment on HOMER was pretty supportive. I like that suit like you mentioned too. But I think the developer stopped the maintenance for some reason. I couldnt get any of my questions answered. Is there a way we ask specific questions related to the usage of the software? Even though, it a very good software for the analysis, lack of support make me used other softwares.

Thanks.

ADD REPLY • link 7.6 years ago by morovatunc ▴ 550

1

Entering edit mode

I don't think the developers have stopped maintenance, a few months back the HOMER site was down and after a couple of e-mails, they put the site back up. They've also responded to a couple of my e-mails, but they haven't addressed some bugs that i've found so getting support directly from the developers seems to be hit or miss at best.

At the bottom of the HOMER site main page you can find some information on authors and where it's being developed. That being said, your best chance at getting questions answered is here on Biostars or directly contacting the developers.

ADD REPLY • link 7.6 years ago by Sinji ★ 3.2k

0

Entering edit mode

This is a terrific post!! Hope to see this continually updated. I've tried ngsplot. Doing a simple quick plot is easy, but once you change anything from default, it becomes confusing. Also since it is written as rscript, it's actually making it hard for me to easily use.

ADD REPLY • link 6.8 years ago by olechnwin ▴ 60

1

Entering edit mode

Highly recommend deeptools. I discourage the use of NGSPLOT (as you can probably tell in this post) because of the same reasons you seem to be disliking it.

HOMER and enrichedHeatmap are great alternatives as well and produce very aesthetically pleasing heatmaps.

ADD REPLY • link 6.8 years ago by Sinji ★ 3.2k

3

Entering edit mode

8.1 years ago

Ming Tommy Tang ★ 3.9k

I have a write-up here, but you have covered most of them https://github.com/crazyhottommy/ChIP-seq-analysis#heatmap-mata-plot

ADD COMMENT • link 8.1 years ago by Ming Tommy Tang ★ 3.9k

2

Entering edit mode

I forgot to mention Meta-seq https://pythonhosted.org/metaseq/example_session.html

ADD REPLY • link 8.1 years ago by Ming Tommy Tang ★ 3.9k

0

Entering edit mode

Ahh .. i've actually read your GitHub page quite a bit when I was first starting. In fact, I believe my list of tools was mostly de-rived from your list. I'll have to give the Genomation package a try.

ADD REPLY • link 8.1 years ago by Sinji ★ 3.2k

0

Entering edit mode

Wow. This is a great resource. I've been trying to wrap my head around analyzing chip-seq for some data that's coming in. This will help tremendously. Thanks.

ADD REPLY • link 8.1 years ago by Damian Kao 16k

0

Entering edit mode

I have read your blog posts as well :) sharing makes a better world.

ADD REPLY • link 8.1 years ago by Ming Tommy Tang ★ 3.9k

0

Entering edit mode

Your blogs have been really very much helpful and quite detailed I must say. Have been using them to tweak for my own use. +1 for the github link.

ADD REPLY • link 8.1 years ago by ivivek_ngs ★ 5.2k

2

Entering edit mode

8.1 years ago

Chirag Nepal ★ 2.4k

I came across another new tool (currently only supports windows) which generates different plots. It is easy to use and is interactive.

Software described in the paper http://www.nature.com/nsmb/journal/vaop/ncurrent/full/nsmb.3180.html

ADD COMMENT • link 8.1 years ago by Chirag Nepal ★ 2.4k

1

Entering edit mode

I think this tool was mentioned in a comment by tangming2005. I don't think i'll be adding the tool into the first post for now, simply because it's a windows tool and it's probably not something I would use. I'll add it in when time allows or if I can get access to a windows computer from a lab member though since the program does look quite nice. If you've used it to generate some heatmaps please feel free to post a link to the image so I can check it out, if they're particularly appealing i'll go ahead and bite the bullet and install a virtual machine to run it on my workstation.

ADD REPLY • link 8.1 years ago by Sinji ★ 3.2k

1

Entering edit mode

8.1 years ago

Ian 6.0k

SeqMiner is another, but it has not been updated since 2013.

ADD COMMENT • link 8.1 years ago by Ian 6.0k

0

Entering edit mode

I actually purposely left this program out because I found it down-right terrible. Granted I only spent a few minutes using it, but there's nothing there that could make it better than the alternatives already mentioned besides a GUI for those that prefer to do their work outside a terminal.

ADD REPLY • link 8.1 years ago by Sinji ★ 3.2k

1

Entering edit mode

8.1 years ago

A. Domingues ★ 2.7k

I haven't tried it myself, but saw a live demo of SeqPlots and it looked very good and responsive: SeqPlots

ADD COMMENT • link 8.1 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

Ahh .. I actually have this installed because I intended to use it, but never actually got around to it! Thanks for reminding me, i'll take a look at it sometime today.

ADD REPLY • link 8.1 years ago by Sinji ★ 3.2k

0

Entering edit mode

Cool. I had the chance to chat with the PI of the group that developed it, Julie Ahringer, and she showed how to use it in a few minutes. I got the impression that as soon as the data is ready in the right format - bed of features, wigs, etc - it was very fast to just interactively explore the data. That also means that it might take some time to get all pieces needed for that.

It looked useful, but we don't deal with ChIP-seq data so much, so I never got around to try it out.

ADD REPLY • link 8.1 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

Just went ahead and added up a section for SeqPlots. The only real annoyance is that the data upload is very strict. If my feature files didn't end with a .bed extension, it would fail. If my feature file had an extra column denoting something of importance for myself, the upload would fail.

Other than that is was pretty straightforward and was able to generate the heatmap in a matter of minutes.

ADD REPLY • link 8.1 years ago by Sinji ★ 3.2k

0

Entering edit mode

Sweet. Thanks a lot for testing all those tools!

ADD REPLY • link 8.1 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

7.2 years ago

FeiZhao • 0

maybe you can try ChIPseeker http://www.bioconductor.org/packages/release/bioc/html/ChIPseeker.html

ADD COMMENT • link 7.2 years ago by FeiZhao • 0

0

Entering edit mode

I have looked into ChIPseeker, unfortunately there are better programs and thus the reason I did not include it. If time permits, I will include some information on this.

ADD REPLY • link 7.1 years ago by Sinji ★ 3.2k

0

Entering edit mode

4.7 years ago

ATpoint 82k

For a minimal tutorial on how to use EnrichedHeatmap see Using EnrichedHeatmap for visualization of NGS experiments

enter image description here

ADD COMMENT • link 4.7 years ago by ATpoint 82k

0

Entering edit mode

4.3 years ago

Ömer An ▴ 260

Very useful tutorial!

I have just integrated the most popular two of these tools (ngsplot and deepTools) to the CSI NGS Portal.

You can generate all the plots for your RNA-Seq/ChIP-Seq samples in a comprehensive, robust and quick way with a few clicks using the portal.

ADD COMMENT • link 4.3 years ago by Ömer An ▴ 260

score 7 · Accepted Answer · 2016-05-08

7

Entering edit mode

8.0 years ago

Zuguang Gu ▴ 220

I am also development an R package EnrichedHeatmap which can make such kind of plot. It is built on the ComplexHeatmap package that actually you can concatenate multiple heatmaps (both this specific type of heatmaps and normal heatmaps) one after another. It is also easy to adjust rows by hierarchical clustering or k-means. With this package, it is convenient to make complex heatmaps to visualize association between multiple sources of information.

Following is an example generated by the package:

enter image description here

ADD COMMENT • link 8.0 years ago by Zuguang Gu ▴ 220

0

Entering edit mode

second to this! very useful and user supportive.

ADD REPLY • link 8.0 years ago by Ming Tommy Tang ★ 3.9k

0

Entering edit mode

I'll give this package a shot and add it up to the main list this week. I went ahead and accepted it as an answer for now so that it's more visible to the general public.

This looks like a really interesting package, I especially love the concatenation ability. Looks really promising. Especially if @tangming2005 likes it. Thanks for sharing!

ADD REPLY • link 8.0 years ago by Sinji ★ 3.2k

score 6 · Accepted Answer · 2016-03-30

Two useful command-line tools to make interesting heatmaps with are matrix2png (link) and ImageMagick convert.

One can start out with a pair of tab-delimited text-formatted matrices with the same row order. To show their different signal types, they can be rendered with two different color palettes. ImageMagick convert can be used to make a composite that highlights the rows where an interesting phenomenon is happening.

For example, some sorted DNaseI-seq data for a series of padded TSSs:

enter image description here

And some NET-seq data over the same rows, with the same row sort order:

enter image description here

Here is the composite of the two heatmaps:

enter image description here

One can imagine layering in ChIP-seq signal or other data categories as needed.

These simple, modular command-line tools provide very powerful options and enable fast exploration of data in different forms.

Here are a couple snippets of matrix files to feed into matrix2png:

And:

Then:

$ matrix2png -data $inCcMtx -size 1:1 -numcolors 220 -mincolor black -midcolor black -maxcolor white -trim 1 > $outCcPng
$ matrix2png -data $inNetseqMtx -size 1:1 -numcolors 220 -mincolor white -midcolor white -maxcolor red -trim 1 > $outNetseqPng
$ convert $outCcPng $outNetseqPng -gravity center -compose Darken -composite $outCompositePng

These tools can be reused for all manner of figures, not just ChIP-seq.