CPM function in EDGER
1
2
Entering edit mode
16 months ago
francesca3 ▴ 70

Hi, I have some doubts on cpm function in edgeR package. Given this initial part of the code:

> library(edgeR)
> group<-factor(c("A", "A", B","B"))
> y <- DGEList(counts=data_clean,group=group)
> y <- calcNormFactors(y)


I obtain different values if I calculate cpm in this way

> cpm <- cpm(y, log = FALSE, normalized.lib.sizes=TRUE)


Or in this one

> cpm <- cpm(y$counts, log = FALSE, normalized.lib.sizes=TRUE)  Why are them different? What does it change? What is the correct way to calculate them? I noticed that if I calculate them with the standard formula "by hand", I obtain the same values as the second formula (y$counts). In the edge manual it seems just to use the y argument.

Thank you, Francesca

edgerR RNA-Seq CPM • 3.5k views
0
Entering edit mode

Please use the formatting bar in edit mode (especially the code option) to present your post better. I've done it for you this time.

Thank you!

0
Entering edit mode

Sorry. Thank you a lot!!

0
Entering edit mode

Hello francesca3!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/133466/

This is typically not recommended as it runs the risk of annoying people in both communities.

10
Entering edit mode
16 months ago
ATpoint 57k

This is normal and expected. If you use the first chunk the function will use the norm.factors from calcNormFactors to correct the counts for both library size and composition, see this video for the theory of library normalization.

With the second chunk of code you are providing only the count matrix so the function has no norm.factors therefore only correct per-million so for library size. That is why this matches with your by hand calculation.

The first method is the one you should use.

0
Entering edit mode

Thank you! Very clear!

0
Entering edit mode