Question

Heatmap Of Rna-Seq Data

8

Entering edit mode

13.8 years ago

Ying W ★ 4.3k

I have been working with microarray data thus far and moving into RNA-seq data. With microarrays I often make heatmaps of different time points (I have time course experiment) to see which time points are similar to each other and I was wondering if it was possible to do something similar with RNA-seq data or if data was typically presented in a different manner.

I wish to look to see which time points are more similar to each other when compared to a control (do packages to analyze RNA-seq give (log) fold change? and if so what would be the vertical axis, genome location, closest gene?)

rna heatmap • 28k views

ADD COMMENT • link updated 13.8 years ago by Duff ▴ 670 • written 13.8 years ago by Ying W ★ 4.3k

score 6 · Answer 1 · 2011-09-16

Hi Ying

The output of RNA-Seq is usually counts (i.e. whole numbers) rather than the continuous values you get with microarrays. As noted by seidel you can transform these data to something more like microarray data.

The bioconductor packages work on the count data as counts rather then FPKM or RPKM. There's a nice discussion of some of the issues surrounding heatmaps and RNA-Seq data on the SeqAnswers site (it's down for me just down but I'll put the URL in a comment once it's working again).

Anyway once I had my counts (I used bowtie to assign counts to genes) I used the bioconductor package edgeR to get regulated genes across time. After reading the SeqAnswers forum (and particularly Simon Anders contributions) I then extracted the count data for my regulated genes, used another bioconductor package (DESeq) to get log-scale variance stabilised data for the regulated genes and then plotted that in a heatmap - genes in rows and times or samples in columns.

I hope this helps you somewhat.

best

d

score 4 · Answer 2 · 2011-09-16

You might take a look at the DEseq or edgeR packages from Bioconductor. You will need to generate a set of counts of reads mapping to genes before using these packages. If this doesn't make sense to you, you should do some background reading on RNA-seq workflows. You could also make use of several commercial solutions like Partek or CLCBio.

score 1 · Answer 3 · 2011-09-16

Heat maps of your data can be made easily, as Sean said, once you generate some form of normalized gene counts from your reads. An easy way to do this is to run your data through tophat and cufflinks, or cuffdiff. With cufflinks you will get FPKM values for each gene for each sample, and with cuffdiff you can easily get FPKM values and ratios between any pair of samples (e.g. test and control, replicated or not). By default, the ratios are in natural log space, but you can transform them to the space you like. You can supply a GFF file of gene descriptions, and the output will simply be a table of those genes with quantification. At that point it will basically resemble the structure of microarray data that you are familiar with - the rows are genes (or transcripts depending on how you run things), and the columns are conditions. One caveat, with RNA Seq you will have genes detected in one sample but not the other (not so good for evaluating ratios), and you'll have to come up with some heuristic that suits you.

Bioconductor is another way to go, but even in that case, you would supply a GFF, or a set of bins (perhaps extracted using biomaRt) over which to quantify reads and assess ratios. The output once again is a table, from which you can make a heat map.