Question

TMM followed by inverse normal transform

1

Entering edit mode

2.6 years ago

joseph.a.decorte ▴ 10

Hey all,

I am following a protocol from a paper that uses the following pre-processing procedure:

a. Read counts were normalized between samples using TMM (Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11, R25 (2010)).

b. Expression values for each gene were inverse normal transformed.

I used edgeR::calcNormFactors to normalize library size via TMM for part a, but I am confused on how to apply an inverse normal transform on my read counts together with my normalized library sizes. What is my misunderstanding? I know that I can apply other transforms like cpm, rpkm, etc., to the results of calcNormFactors, and it will transform using the normalized library sizes -- is there a similar function for inverse normal transformation?

Appreciate any help.

edgeR GTEx RNAseq • 1.4k views

ADD COMMENT • link updated 2.6 years ago by Gordon Smyth ★ 7.0k • written 2.6 years ago by joseph.a.decorte ▴ 10

score 5 · Accepted Answer · 2021-09-08

5

Entering edit mode

2.6 years ago

Gordon Smyth ★ 7.0k

I am not convinced that the inverse normal transformation is a good idea, so we don't provide a function for it in edgeR. Here is how you would do it however:

dge <- calcNormFactors(dge)
logCPM <- cpm(dge, log=TRUE)
n <- ncol(logCPM)
zvalues <- qnorm(ppoints(n))
z <- logCPM
for (i in 1:nrow(z)) z[i,] <- zvalues[order(order(z[i,]))]

The inverse normal values are now stored in z.

ADD COMMENT • link 2.6 years ago by Gordon Smyth ★ 7.0k

0

Entering edit mode

This looks fantastic, thank you! I have also read a few papers about overuse of INT in unfitting scenarios, but right now I’m just trying to replicate the data in a paper… Any resources you recommend for exploring other normalization methods? Thanks again.

ADD REPLY • link 2.6 years ago by joseph.a.decorte ▴ 10

0

Entering edit mode

The edgeR recommendation is simply to use logCPM for most purposes (other than the DE analysis itself, which does not require normalized expression values).

ADD REPLY • link 2.6 years ago by Gordon Smyth ★ 7.0k