How to plot the heatmap of gene expression for very large data set ?
2
4
Entering edit mode
9.8 years ago
jack ▴ 960

Hi,

I have gene expression matrix from NGS data. it's around 30000 genes and 1000 samples.

I want to create the heatmap of this gene expression matrix. I used heatmap() function in R, but it does not work for very large data. Would someone recommend me a package to create heatmap of big data matrix ?

next-gen RNA-Seq R • 14k views
ADD COMMENT
2
Entering edit mode

Well you're going to want to subset that anyway, since a 30000x1000 heatmap won't be very interpretable.

ADD REPLY
2
Entering edit mode

Like Devon Ryan said, it won't make sense to create a heatmap that large. Why don't you find the differentially expressed genes and create a heatmap of those genes instead?

ADD REPLY
0
Entering edit mode

I want to have global view about the expression landscape of my genes. That's why I want to look at it in this way.

ADD REPLY
7
Entering edit mode

Although, I still don't think it's a good idea, but if you really want then you can use the R package pheatmap to create & plot clusters of similarly expressed genes. So instead of plotting 30000 genes, you will be plotting x number (can be 25, 50, 100 or more) of clusters of similarly expressed genes by providing a value to k_means parameter in the pheatmap function. If you want to cluster rows, use cluster_rows=T and to cluster columns, use cluster_cols=T (you may want to do both because of the large dataset).

You can cluster both the rows & the columns using either a distance matrix or using a distance measure like "euclidean" or "correlation".

ADD REPLY
0
Entering edit mode

The pheatmap package is new to me, thanks for pointing it out!

ADD REPLY
0
Entering edit mode

I used the package once. Its a very powerful package, and you can do a lot of things with it, provided you read the manual thoroughly. It makes "pretty heatmaps" of your ugly data, hence the name.

ADD REPLY
1
Entering edit mode

Your global view won't be changed by subsetting a bit.

ADD REPLY
0
Entering edit mode

Then, what is the reasonable subset size?

ADD REPLY
3
Entering edit mode

Try a 1000 or so genes and a 100 samples and then increase that by a bit to see if there are any large changes. If there aren't, then you're catching the gist of the global structure in your subset.

ADD REPLY
1
Entering edit mode
9.6 years ago

pheatmap and ggplot's heatmap.2 functions in R could be useful for this task. If these don't work then a better alternative would be to create your own script to draw the heatmap using reportlab graphics module in Python (or any other graphics modules available).

ADD COMMENT
1
Entering edit mode
7.7 years ago
Guangchuang Yu ★ 2.6k
> ?image
image                 package:graphics                 R Documentation

Display a Color Image

Description:

     Creates a grid of colored or gray-scale rectangles with colors
     corresponding to the values in ‘z’.  This can be used to display
     three-dimensional or spatial data aka _images_.  This is a generic
     function.

     The functions ‘heat.colors’, ‘terrain.colors’ and ‘topo.colors’
     create heat-spectrum (red to white) and topographical color
     schemes suitable for displaying ordered data, with ‘n’ giving the
     number of colors desired.

Usage:

     image(x, ...)

     ## Default S3 method:
     image(x, y, z, zlim, xlim, ylim, col = heat.colors(12),
           add = FALSE, xaxs = "i", yaxs = "i", xlab, ylab,
           breaks, oldstyle = FALSE, useRaster, ...)

image is the fastest command in R to display heatmap.

ADD COMMENT

Login before adding your answer.

Traffic: 2735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6