I have gene expression matrix from NGS data. it's around 30000 genes and 1000 samples.
I want to create the heatmap of this gene expression matrix. I used
heatmap() function in R, but it does not work for very large data. Would someone recommend me a package to create heatmap of big data matrix ?
Well you're going to want to subset that anyway, since a 30000x1000 heatmap won't be very interpretable.
Like Devon Ryan said, it won't make sense to create a heatmap that large. Why don't you find the differentially expressed genes and create a heatmap of those genes instead?
I want to have global view about the expression landscape of my genes. That's why I want to look at it in this way.
Although, I still don't think it's a good idea, but if you really want then you can use the R package pheatmap to create & plot clusters of similarly expressed genes. So instead of plotting 30000 genes, you will be plotting x number (can be 25, 50, 100 or more) of clusters of similarly expressed genes by providing a value to
k_meansparameter in the pheatmap function. If you want to cluster rows, use
cluster_rows=Tand to cluster columns, use
cluster_cols=T(you may want to do both because of the large dataset).
You can cluster both the rows & the columns using either a distance matrix or using a distance measure like "euclidean" or "correlation".
The pheatmap package is new to me, thanks for pointing it out!
I used the package once. Its a very powerful package, and you can do a lot of things with it, provided you read the manual thoroughly. It makes "pretty heatmaps" of your ugly data, hence the name.
Your global view won't be changed by subsetting a bit.
Then, what is the reasonable subset size?
Try a 1000 or so genes and a 100 samples and then increase that by a bit to see if there are any large changes. If there aren't, then you're catching the gist of the global structure in your subset.