Question: Which counts to use for RNA-seq heatmap and PCA?
0
gravatar for Lucy
5 weeks ago by
Lucy40
Lucy40 wrote:

Hi,

I have RNA-seq data that I would like to visualise with a PCA plot and a heatmap. I am wondering whether I should use normalised or log transformed normalised counts for this.

I have generated TMM-normalised counts per million in EdgeR as follows:

y <- calcNormFactors(y)
tmm <- edgeR::cpm(y)

I have also generated log2 transformed normalised TMM CPM:

tmm_log <- edgeR::cpm(y, log = T, prior.count = 1)

I am wondering whether it is best to use just the normalised CPMs, or the log-transformed normalised CPMs for a PCA plot and heatmap. I find that the plots look better when I use log-transformed normalised counts, but I am not sure whether this is the correct approach.

Could someone please explain why you would/would not want to use log counts?

Many thanks,

Lucy

heatmap edger rna-seq pca • 156 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Lucy40

Thank you, I am currently scaling by row using the heatmap.2 function from the gplots package. Is this an acceptable way to do the scaling?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Lucy40

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. This comment should go under @ATPoint's answer.

SUBMIT ANSWER is for new answers to original question.

ADD REPLYlink written 5 weeks ago by genomax76k

Without code I cannot comment.

ADD REPLYlink written 5 weeks ago by ATpoint28k

heatmap.2(tmm_log, trace = "none", col = bluered(20), scale = "row")

ADD REPLYlink written 5 weeks ago by Lucy40
1
gravatar for ATpoint
5 weeks ago by
ATpoint28k
Germany
ATpoint28k wrote:

For PCA one typically uses log2 normalized counts so in this case tmm_log. For heatmaps one is typically interested in the relative differences between samples. Therefore it makes sense to Z-transform your tmm_log, e.g. by t(scale(t(tmm_log))). This will then give you the relative deviation of each sample from the mean of all samples. While technically possible to directly use tmm_log in heatmaps it is typically not a good choice. The reason is that counts are very different between genes due to the endogenous expression levels and differences in gene length so a few highly-expressed genes would dominate the heatmap. That is why Z-transformation is a good choice.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by ATpoint28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1307 users visited in the last hour