24 months ago by

University College London

There are different ways to gauge [graphically] how effective a normalisation has been. Looking at your second plot, it would appear in this case that normalisation has been successful.

Apart from box-and-whisker plots, one can also do:

# Violin plot

Using regularised log or variance stabilised counts:

```
require(reshape2)
violinMatrix <- reshape2::melt(loggedCounts)
colnames(violinMatrix) <- c("Gene","Sample","Expression")
library(ggplot2)
ggplot(violinMatrix, aes(x=Sample, y=Expression)) + geom_violin() + theme(axis.text.x = element_text(angle=45, hjust=1))
```

# pairwise sample scatter plots

Using regularised log or variance stabilised counts:

```
require(car)
scatterplotMatrix(loggedCounts, diagonal="boxplot", pch=".")
```

# Dispersion plot

Just looking at the unlogged, normalised counts, a dispersion plot gives a good idea of how good the modelling of dispersion dependent on the mean normalised counts has been.

```
options(scipen=999)
plotDispEsts(dds, genecol="black", fitcol="red", finalcol="dodgerblue", legend=TRUE, log="xy", cex.axis=0.8, cex=0.3, cex.main=0.8, xlab="Mean of normalised counts", ylab="Dispersion")
options(scipen=0)
```

## ------------------------------

## -------------------------------

# More for outlier detection:

# Bootstrapped hierarchical clustering (unsupervised - i.e. entire dataset)

Using regularised log or variance stabilised counts:

```
require(pvclust)
pv <- pvclust(loggedCounts, method.dist="euclidean", method.hclust="ward.D2", nboot=100)
plot(pv)
```

# Principal components analysis

# Symmetrical sample heatmap

Using regularised log or variance stabilised counts:

```
require(gplots)
distsRL <- dist(t(loggedCounts))
mat <- as.matrix(distsRL)
rownames(mat) <- colnames(mat) <- with(colData(dds), paste(metadata$IDlist, metadata$condition, sep=", "))
hc <- hclust(distsRL)
heatmap.2(mat, Rowv=as.dendrogram(hc), symm=TRUE, trace="none", col=rev(hmcol), cexRow=1.0, cexCol=1.0, margin=c(13, 13), key=FALSE)
```

Kevin