Yes, you can begin with FPKM values but you will have to transform these values and also filter the dataset with your 800 differentially expressed genes. On that note, 800 is a lot of genes. Try to increase your cut-offs for statistically significantly differentially expressed. Try things like:

- FDR Q<0.05 and absolute log2 fold-change>2
- FDR Q<0.01 and absolute log2 fold-change>4
- FDR Q<0.01 and absolute log2 fold-change>2
- FDR Q<0.001 and absolute log2 fold-change>2
- FDR Q<0.001 and absolute log2 fold-change>4
- FDR Q<0.0001 and absolute log2 fold-change>2

*et cetera.*

Use this code (below).

- Your FPKM values will be stored in
*MyFPKMValues*
*DiffExpressedGenes* will comprise a single vector of genes that are differentially expressed
*zFPKM* package will be used to convert your FPKM values to the
Z-scale prior to clustering.

## ---------------------------

**Set colour and heatmap scaling breaks**

```
require("RColorBrewer")
myCol <- colorRampPalette(c("dodgerblue", "black", "yellow"))(100)
myBreaks <- seq(-3, 3, length.out=101)
```

**Scale the FPKM values to the Z scale**

```
library(zFPKM)
heat <- zFPKM(MyFPKMValues)
```

**Filter your dataset to include your differentially expressed genes:**

```
heat <- heat[which(rownames(heat) %in% DiffExpressedGenes), ]
```

Generate heatmaps with Euclidean distance (first) and '1 - Pearson correlation' distance (second) (both use Ward's linkage)

```
require("gplots")
#Euclidean distance
heatmap.2(heat,
col=myCol,
breaks=myBreaks,
main="Title",
key=T, keysize=1.0,
scale="none",
density.info="none",
reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
trace="none",
cexRow=0.2,
cexCol=0.8,
distfun=function(x) dist(x, method="euclidean"),
hclustfun=function(x) hclust(x, method="ward.D2"))
#1-cor distance
heatmap.2(heat,
col=myCol,
breaks=myBreaks,
main="Title",
key=T, keysize=1.0,
scale="none",
density.info="none",
reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
trace="none",
cexRow=0.2,
cexCol=0.8,
distfun=function(x) as.dist(1-cor(t(x))),
hclustfun=function(x) hclust(x, method="ward.D2"))
```

To each heatmap command, you can add *ColSideColors*, which is a vector of colours for a condition of interest, such as case/controls. The order of this colour vector has to match the order of samples in your 'heat' object that you pass to `heatmap.2`

Note that converting to the Z scale is not exclusive: These guys median-centered their FPKM data and then log (base 2) transformed them prior to heatmap generation

An update (6th October 2018):You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

48k