Question: Does Gene length corrected TMM [GeTMM] violate any assumptions of TMM normalization?
I've read and been told that edgeR must take in counts only. I noticed that the RPK values are being fed into the TMM normalization procedure. Is this a correct usage assuming all of the assumptions?

Should this be used for downstream DGE analysis?

Note: I am no expert with these methods but I just wanted to ask the community

# calculate RPK
rpk <- (x[,2:ncol(x)]/x[,1])
# remove length col in x
x <- x[,-1]
# for normalization purposes, no grouping of samples
group <- c(rep("A",ncol(x)))
x.norm.edger <- DGEList(counts=x,group=group)
x.norm.edger <- calcNormFactors(x.norm.edger)
norm.counts.edger <- cpm(x.norm.edger)

rpk.norm <- DGEList(counts=rpk,group=group)
rpk.norm <- calcNormFactors(rpk.norm)
norm.counts.rpk_edger <- cpm(rpk.norm)

# Source:
ADD COMMENTlink modified 21 months ago by Damian Kao15k • written 21 months ago by O.rka240
Technically, RPK values do not violate assumptions of TMM.

TMM is just a technique that tries to find the non-DE portion of the expression distribution by very liberally trimming off outliers. It doesn't matter what kind of expression units you are using.

However, RPK values do violate assumptions for DE analysis. So you cannot use it for downstream DGE.

ADD COMMENTlink written 21 months ago by Damian Kao15k

I understand your point, but why then in the article they use GeTMM for DE as well? See figure 7 here

ADD REPLYlink written 9 months ago by salamandra360
For DGE, use raw counts, like the software demands. Other normalizations can be used for things like visualizations.

ADD COMMENTlink written 21 months ago by swbarnes29.6k
