Question: Normalization scaling factors: formula for applying them to raw counts
0
14 months ago by
sovrappensiero10 wrote:

Hello,

I am using both edgeR and DESeq2 to normalize raw counts (it's not RNA-seq data or 16S amplicon seq data...but it is amplicon seq data). I just need to normalize them before creating a visualization. It's preliminary work; so the parts of these packages that calculate differential expression are not useful to me.

I have two sets of scaling factors (from edgeR using the TMM and RLE methods). My question is what is the correct approach for applying these scaling factors to my raw counts. Is it:

``````raw count / scaling factor
``````

or

``````raw count / (library size * scaling factor)
``````

I've been researching these methods and so far I have seen it both ways. I'm still not sure how to just get normalization factors from DESeq2, as I just got that package installed yesterday evening. But I've kept the DESeq2 tag because the question applies to both and if anyone has advice regarding DESeq2 that could be helpful to me and others.

Rookie question: the dispersion calculation would make sense for evaluating DE, not as part of the normalization, right?

Thanks for the help.

R edger deseq2 normalization • 1.9k views
written 14 months ago by sovrappensiero10
1
13 months ago by
Kevin Blighe45k
Kevin Blighe45k wrote:

To normalise, you do just divide by the size factor (assuming that you have arrived at your size factors in the correct way). This is exemplified in a good example here: Normalization

To obtain the DESeq2 normalisation factors in the first place, you could just first normalise the data in DESeq2 and then use: `sizeFactors(dds)` This is stated in the vignette: Analyzing RNA-seq data with DESeq2

For dispersion, take a look at my answer here: A: Clarification on how DSEeq2 Dispersion Curve is Generated I am almost certain that dispersion is indeed used for DE analysis.

1

Thank you! That was very helpful.

@Kevin: Is this method still valid for scale factors generated by upper quartile or scaled median normalization? Are RLE and median of ratios described in your link the same calculation? Same question for median and scaled median methods?

I cannot say that each normalisation method just involves a division by a particular size factor - each has a different formula that may or may not involve a 'size factor'.

From what I understand, the median ratios method is an extension of RLE, and is currently the method used by DESeq2, as per the link that I gave. For 100% clarification, would suggest re-posting your question on the Bioconductor forum where the DESeq2 developers are more likely to respond.

A good practice would be to calculate the size factors manually and then via DESeq2, and then you'll have empirical evidence of how exactly it works.