Clustering/Hetamap of count RNAseq data
1
0
Entering edit mode
11 months ago

Dear all,

I think this is a pretty trivial question, but how should one normalize count RNAseq data for clustering/mds/etc? Will his suffice?

library (DESeq2)
cnts <- matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10)
normed= varianceStabilizingTransformation(cnts)
pheatmap::pheatmap(cor(normed))

RNA-Seq Normalization • 442 views
1
Entering edit mode

from DESeq2 help on rlog

The rlog transformation produces a similar variance stabilizing effect as varianceStabilizingTransformation, though rlog is more robust in the case when the size factors vary widely. The transformation is useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis.

library (DESeq2)
cnts <- matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10)
normed= rlog(cnts)
pheatmap::pheatmap(cor(normed))

3
Entering edit mode
11 months ago
ATpoint 55k

For heatmaps and other downstream such as PCA or any kind of classification/machine learning one commonly uses vst/rlog or something like the normalized counts on the log2-scale. For Pearson correlation (cor) it depends what you want to show. The linear cor changes obviously when you apply a log transformation as log scale is not linear (which is the whole point of logs). If you want to see how your samples compare in terms of a traditional Pearson correlation then I'd use the raw counts or the output of counts(dds,normalized=TRUE) which will give the same correlations as they are on the same linear scale and normalization by DESeq2 is just dividing the raw counts by a single factor.

0
Entering edit mode

Thank you ATpoint. Exactly what I needed o know.