Clustering/Hetamap of count RNAseq data
1
0
Entering edit mode
3.4 years ago

Dear all,

I think this is a pretty trivial question, but how should one normalize count RNAseq data for clustering/mds/etc? Will his suffice?

library (DESeq2)
cnts <- matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10)
normed= varianceStabilizingTransformation(cnts)
pheatmap::pheatmap(cor(normed))
RNA-Seq Normalization • 1.1k views
ADD COMMENT
1
Entering edit mode

from DESeq2 help on rlog

The rlog transformation produces a similar variance stabilizing effect as varianceStabilizingTransformation, though rlog is more robust in the case when the size factors vary widely. The transformation is useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis.

library (DESeq2)
cnts <- matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10)
normed= rlog(cnts)
pheatmap::pheatmap(cor(normed))
ADD REPLY
3
Entering edit mode
3.4 years ago
ATpoint 82k

For heatmaps and other downstream such as PCA or any kind of classification/machine learning one commonly uses vst/rlog or something like the normalized counts on the log2-scale. For Pearson correlation (cor) it depends what you want to show. The linear cor changes obviously when you apply a log transformation as log scale is not linear (which is the whole point of logs). If you want to see how your samples compare in terms of a traditional Pearson correlation then I'd use the raw counts or the output of counts(dds,normalized=TRUE) which will give the same correlations as they are on the same linear scale and normalization by DESeq2 is just dividing the raw counts by a single factor.

ADD COMMENT
0
Entering edit mode

Thank you ATpoint. Exactly what I needed o know.

ADD REPLY

Login before adding your answer.

Traffic: 1736 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6