Question: CPM read count normalization: what does it mean between replicates of same group and within same replicate?
gravatar for salamandra
2.0 years ago by
salamandra350 wrote:


1- In the table ┬┤Normalization method' here says that CPM (counts per million) can be used for gene count comparisons between replicates of the same sample group.

1.1 Does it mean that for eg. we can compare one gene from a sample of group 'control' with same gene of another 'control' sample but that we cannot compare a gene in 'control' sample with same gene in a 'treatment' sample?

1.2 If so, then when looking for a heatmap with CPM values cannot we for e.g. identify genes that seem to have a higher expression in 'treatment' samples than in 'control' samples? Do we need to use a different normalization method?

2- In same table says that CPM cannot be used for within sample comparisons.

2.1 Does it mean we cannot compare different genes of the same sample?

2.2 What if when looking to CPM heatmap it seems one gene is varying more between 'control' and 'treatment' than the other. Can we make this conclusion if heatmap plots CPM values?

ADD COMMENTlink modified 2.0 years ago by ATpoint44k • written 2.0 years ago by salamandra350
gravatar for ATpoint
2.0 years ago by
ATpoint44k wrote:

As recommended in this presentation, I would not use per-million methods for anything as there are better methods now. Check this video to get an idea why per-million based methods are not optimal and this one on how the normalization in e.g. DESeq2 works.

Towards your questions:

1 - you can use it but it is not recommended for DE analysis, so better don't use it at all

1.1 - Simply normalize the entire dataset with edgeR or DESeq2 and do comparisons with these values

1.2 - do not use CPM values for a heatmap, use logged/normalized counts, like those produced by the vst or rlog functions in DESeq2. Using non-log counts will bias the heatmap towards highly expressed genes. These video series I inked above also have a video about logs in case you care.

2 - true, because it does not normalize for gene length, so longer genes inherently have higher counts than short genes.

2.1 - one probably could, but not without adjusting for gene length (use the search function on this, there are plenty of posts on that matter already out there).

2.2 - it might give you an idea but you should use appropriate statistics to infer differentially expressed genes.

ADD COMMENTlink written 2.0 years ago by ATpoint44k

Thank you for the answer and the video on DESeq2 normalization

ADD REPLYlink written 2.0 years ago by salamandra350
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1029 users visited in the last hour