Question: CPM read count normalization: what does it mean between replicates of same group and within same replicate?
gravatar for salamandra
12 months ago by
salamandra270 wrote:


1- In the table ┬┤Normalization method' here says that CPM (counts per million) can be used for gene count comparisons between replicates of the same sample group.

1.1 Does it mean that for eg. we can compare one gene from a sample of group 'control' with same gene of another 'control' sample but that we cannot compare a gene in 'control' sample with same gene in a 'treatment' sample?

1.2 If so, then when looking for a heatmap with CPM values cannot we for e.g. identify genes that seem to have a higher expression in 'treatment' samples than in 'control' samples? Do we need to use a different normalization method?

2- In same table says that CPM cannot be used for within sample comparisons.

2.1 Does it mean we cannot compare different genes of the same sample?

2.2 What if when looking to CPM heatmap it seems one gene is varying more between 'control' and 'treatment' than the other. Can we make this conclusion if heatmap plots CPM values?

ADD COMMENTlink modified 12 months ago by ATpoint28k • written 12 months ago by salamandra270
gravatar for ATpoint
12 months ago by
ATpoint28k wrote:

As recommended in this presentation, I would not use per-million methods for anything as there are better methods now. Check this video to get an idea why per-million based methods are not optimal and this one on how the normalization in e.g. DESeq2 works.

Towards your questions:

1 - you can use it but it is not recommended for DE analysis, so better don't use it at all

1.1 - Simply normalize the entire dataset with edgeR or DESeq2 and do comparisons with these values

1.2 - do not use CPM values for a heatmap, use logged/normalized counts, like those produced by the vst or rlog functions in DESeq2. Using non-log counts will bias the heatmap towards highly expressed genes. These video series I inked above also have a video about logs in case you care.

2 - true, because it does not normalize for gene length, so longer genes inherently have higher counts than short genes.

2.1 - one probably could, but not without adjusting for gene length (use the search function on this, there are plenty of posts on that matter already out there).

2.2 - it might give you an idea but you should use appropriate statistics to infer differentially expressed genes.

ADD COMMENTlink written 12 months ago by ATpoint28k

Thank you for the answer and the video on DESeq2 normalization

ADD REPLYlink written 12 months ago by salamandra270
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1354 users visited in the last hour