Question: What are median and quantile normalization?
1
gravatar for pyKey
12 months ago by
pyKey40
pyKey40 wrote:

Hello everyone,

Normally I use TPM for within-sample analysis. Recently I got a suggestion to use Median and Quantile between-sample normalization methods. I noticed that DESeq and Limma packages offer the methods. But... what are they doing? What is the intuition behind them?

Thank you all,

rna-seq normalization • 1.8k views
ADD COMMENTlink modified 12 months ago • written 12 months ago by pyKey40

Right! So more explanation:

I have a bunch of RNA-Seq experiments and I am performing some simple gene expression comparisons between two conditions (wildtype vs. mutants). Some conditions have at most two replicates. I already TPM normalized all the samples, but for comparisons, another between-sample normalization step seems like a good idea.

So far your explanations are of great help. Thank you all!

ADD REPLYlink written 12 months ago by pyKey40

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLYlink written 12 months ago by genomax65k

I already TPM normalized all the samples

If you are performing differential expression with DESeq2 or limma, don't transform the data. DESeq2 expects raw counts. For RNAseq with limma, you have to perform the voom transformation on the raw counts as well. Repeating: start with raw counts, not TPM, for both packages (and edgeR, for that matter).

ADD REPLYlink modified 12 months ago • written 12 months ago by h.mon24k
1
gravatar for h.mon
12 months ago by
h.mon24k
Brazil
h.mon24k wrote:

Your question is poorly explained: what downstream analyses you intend to perform? Are you moving from within-sample comparisons to differential expression analysis?

I believe DESeq2 does not perform quantile nor median normalization, only limma.

About limma between-array normalization: quantile normalization is performed to make the distribution of microarray intensity signals the same between all arrays being analysed. Median normalization (method="scale") makes the samples to have the same median.

DESeq2 and edgeR normalize for library size, each package has a different method for performing the normalization, but the idea is to make all samples sequencing depth "the same". DESeq2 uses some transformations (rlog and vst) for exploratory analyses and visualization, but these are not used for differential expression analysis.

Some resources:

http://genomicsclass.github.io/book/pages/normalization.html

https://www.reddit.com/r/bioinformatics/comments/14eae2/can_someone_explain_median_normalization_to_me/

https://stats.stackexchange.com/questions/10744/how-does-quantile-normalization-work

ADD COMMENTlink written 12 months ago by h.mon24k
1

DESeq's method for library normalization is median based; it makes a geometric mean pseudo-sample, finds the median expressed gene in that, and corrects all counts of other samples based on making them all have the same expression at that gene. Obviously this is only smart if you think that only a small subset of your genes are significantly changing expression, and it's safe to anchor your counts based on that gene with a median expression.

ADD REPLYlink written 12 months ago by swbarnes25.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1174 users visited in the last hour