Is calcuNormFactors necessary in the calculation of RPKM?
2.3 years ago
wangdp123 ▴ 250

It seems that there are two ways of calculating RPKM values based on rpkm function of edgeR from the output of featurecounts. One uses calcuNormFactors and the other doesn't.

I wonder which one is more appropriate for generating RPKM values from RNA-Seq analysis? Why?

1)

x <- DGEList(counts=fc$counts, genes=fc$annotation[,c("GeneID","Length")])
x_rpkm <- rpkm(x,x$genes$Length)


2)

y <- DGEList(counts=fc$counts, genes=fc$annotation[,c("GeneID","Length")])
y <- calcNormFactors(y)
y_rpkm <- rpkm(y,y$genes$Length)


2.3 years ago
ATpoint 54k

It is the second one as this takes into account the TMM-derived scaling factors which, beyond sequencing depth also corrects for library composition changes. There is a video on YT on TMM normalization (StatQuest series) that explains this quite well.