Question

Which Normalization is good in RNA-Seq ? How normalization is calculated ?

0

Entering edit mode

4.6 years ago

takoyaki ▴ 120

Hi, everyone. I want to ask "normalization". This is very confusing term for me.

I suppose basic bulk RNA-Seq pipeline, like hisat2 → featureCounts → DESeq2. In this situation, I want to draw PCA, dendrogram, co-scatter plot and heatmap.

Now, I am using normalization like below.

PCA analysis：R function prcomp( data, scale = TRUE)
dendrogram： No ( I use distance calculated from raw count matrix )
co-scatter plot：I have no idea which method I should use
heatmap：Z-score calculated from raw count matrix

Then, I want to ask some questions.

Is my normalization appropriate ?
Which method is good for co-scatter plot ?
I could understand Z-score, but in other method, what is objective and goals in normalization?
Why some methods want to use log value ? Also, doesn't meaning of expression value loose by normalization ?

Thanks

rna-seq RNA-Seq R • 1.9k views

ADD COMMENT • link 4.6 years ago by takoyaki ▴ 120

2

Entering edit mode

Please read the DESeq2 manual ( http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html ), it is all explained in there. It offers a convenience function for PCA (plotPCA) and a couple of normalization methods vst / rlog upstream of applications such as PCA/clustering or other machine learning applications.

Z-score calculated from raw count matrix

Whatever you do in bioinformatics, no clustering/analysis will ever be done on raw data. With that I mean that one always has to normalize data prior to any analysis. DESeq2 itself accepts raw counts, will then normalize internally followed by differential analysis. Heatmaps can indeed be based on the Z-score but this should be done on log2-transformed normalized counts. A log-like transformation that both normalizes the counts and transformed to log-like scale is e.g. vst.

ADD REPLY • link 4.6 years ago by ATpoint 82k

0

Entering edit mode

Thanks. So, in bulk RNA-Seq, processing raw count matrix by vst or rlog before any analisys is standard, right ?

ADD REPLY • link 4.6 years ago by takoyaki ▴ 120

0

Entering edit mode

In any *-seq you have to normalize. Please read e.g. https://peerj.com/preprints/27283/ and get a solid background before analyzing data.

ADD REPLY • link 4.6 years ago by ATpoint 82k

0

Entering edit mode

To me, all the methods you mentioned above is 'scaling' method, not normalization.

ADD REPLY • link 4.6 years ago by shoujun.gu ▴ 350

0

Entering edit mode

What is the main difference between "scaling" and "normalization" ?

ADD REPLY • link 4.6 years ago by takoyaki ▴ 120

score 2 · Accepted Answer · 2019-09-16

It seems that you first need to understand the general flow of a [bulk] RNA-seq experiment.

Normalisation has the aim to counteract sources of bias that exist in the raw counts, such that we can compare samples to each other, i.e., for differential expression.
Scaling / transforming will take the normalised data and, loosely speaking, make it suitable for downstream analyses like clustering, PCA, etc., which expect data to follow a normal distribution.

Generally, the process goes like this:

raw counts
normalised counts <- differential expression analysis performed on these
transformed data <- downstream analyses use these, i.e., PCA, clustering, etc.

In your very brief textual description of your analysis pipeline, featureCounts will derive the raw counts, and then DESeq2 will performed the normalisation and transformation. However, you have not shown any of your code; so, we cannot know exactly what you have performed.