Question: How does edgeR do ERCC spike-in normalization?
1
gravatar for moushengxu
2.2 years ago by
moushengxu350
moushengxu350 wrote:

It seems "calcNormFactors(..)" only normalizes by library size. It would be a big loss of information if ERCC spike-in information is not used.

Thanks.

rna-seq R software error • 1.7k views
ADD COMMENTlink modified 2.2 years ago by eldronzhou350 • written 2.2 years ago by moushengxu350
1
gravatar for eldronzhou
2.2 years ago by
eldronzhou350
eldronzhou350 wrote:

You can simply calculate norm factors on ERCC spike-in, and pass them to downstream analysis.

 x$samples$norm.factors = calcNormFactors(x[spikes,])$samples$norm.factors

BTW, in RUV paper the authors suggest that ERCC spike-in does not behave like endogenous genes. Global normalisation based on ERCC spike-in can lead to poor normalised counts.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by eldronzhou350

Thanks for the reply. I read the abstract of the RUV paper as well. One researcher in our lab told me that he had some experience with ERCC normalization -- without it he had some ridiculous results.

BTW, how to use RUV to do ERCC normalization?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by moushengxu350
browseVignettes("RUVSeq)
ADD REPLYlink written 2.2 years ago by eldronzhou350

The RUV method assumes the RLE distribution to be centered around 0, but I am not sure this should always be the case. In our treatment, the cells are treated with toxic chemicals and gene expressions are in large reduced due to the harmful effect of the treatment. IMHO, ERCC is the most logically sound among all kinds of normalization (library size, RUV, etc.)?

As you said above, it seems reasonable to me that I should normalize by ERCC spike-in and then feed the data to edgeR. Not sure why edgeR authors do not like any sort of normalization, though.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by moushengxu350

Normalisation method like TMM (default by calling calcNormFactors) or other global scaling method also assumes there is no global shift in gene expression. I agree with you that normalization on spike-in is a good idea in your case.

RUV method does not assume RLE distribution to be centered around 0. RLE plot is a diagnostic plot to check whether your sample have similar distriubtion if you believe most of genes are not differentially expressed (not true as you have mentioned). There are other standards to check (PCA, p-val distribution, positive controls, etc.). The assumption of RUVg is that the factors of unwanted variation estimated from spike-ins span the same linear space as the factors of unwanted variation for all of genes [1].

You may also try other methods like supervised svaseq [2] or cyclic loess regression on spike-ins [3]. From my experience supervised svaseq behaves similarly with RUVg. I don't have much experience using cyclic loess on ERCC spike-in, but from RUV paper it seems that it does not perform very well in RNA-seq.

Refs:

  1. Normalization of RNA-seq data using factor analysis of control genes or samples

  2. svaseq: removing batch effects and other unwanted noise from sequencing data

  3. Revisiting global gene expression analysis

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by eldronzhou350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 809 users visited in the last hour