What are median and quantile normalization?
1
1
Entering edit mode
3.0 years ago
pyKey ▴ 50

Hello everyone,

Normally I use TPM for within-sample analysis. Recently I got a suggestion to use Median and Quantile between-sample normalization methods. I noticed that DESeq and Limma packages offer the methods. But... what are they doing? What is the intuition behind them?

Thank you all,

RNA-Seq Normalization • 5.7k views
0
Entering edit mode

Right! So more explanation:

I have a bunch of RNA-Seq experiments and I am performing some simple gene expression comparisons between two conditions (wildtype vs. mutants). Some conditions have at most two replicates. I already TPM normalized all the samples, but for comparisons, another between-sample normalization step seems like a good idea.

So far your explanations are of great help. Thank you all!

0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

0
Entering edit mode

I already TPM normalized all the samples

If you are performing differential expression with DESeq2 or limma, don't transform the data. DESeq2 expects raw counts. For RNAseq with limma, you have to perform the voom transformation on the raw counts as well. Repeating: start with raw counts, not TPM, for both packages (and edgeR, for that matter).

1
Entering edit mode
3.0 years ago
h.mon 32k

Your question is poorly explained: what downstream analyses you intend to perform? Are you moving from within-sample comparisons to differential expression analysis?

I believe DESeq2 does not perform quantile nor median normalization, only limma.

About limma between-array normalization: quantile normalization is performed to make the distribution of microarray intensity signals the same between all arrays being analysed. Median normalization (method="scale") makes the samples to have the same median.

DESeq2 and edgeR normalize for library size, each package has a different method for performing the normalization, but the idea is to make all samples sequencing depth "the same". DESeq2 uses some transformations (rlog and vst) for exploratory analyses and visualization, but these are not used for differential expression analysis.

Some resources:

http://genomicsclass.github.io/book/pages/normalization.html

https://stats.stackexchange.com/questions/10744/how-does-quantile-normalization-work

1
Entering edit mode

DESeq's method for library normalization is median based; it makes a geometric mean pseudo-sample, finds the median expressed gene in that, and corrects all counts of other samples based on making them all have the same expression at that gene. Obviously this is only smart if you think that only a small subset of your genes are significantly changing expression, and it's safe to anchor your counts based on that gene with a median expression.