Quantile normalisation: raw rpkm or log2rpkm?
2
1
Entering edit mode
5.2 years ago
YaGalbi ★ 1.5k

Hi all,

Im really stuck with this one so any help would be appreciated.

I have the expression data for 28 different tissues. The matrix I create will look something like below but with 28 tissues and around 50k rows:

       tissue1    tissue2     tissue3 .....etc....
gene1
gene2
gene3
etc..


I'm going to use limma's "normalizeBetweenArrays()" function to quantile normalise the data. I cant figure out whether I should be filling the matrix with the raw rpkm values or the log2 normalised values for entry into the limma function. Which one should it be?

EDIT: I do get how quantile normalisation works, but I just dont know whether it is correct to use it on log2 values. I have read some resources on this hwoever no one is clear about what the input is.

Thanks,

Kenneth

rnaseq quantile normalisation • 4.3k views
0
Entering edit mode

https://en.wikipedia.org/wiki/Quantile_normalization

Without theory see this recent paper as an example:

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702322/

Also see thist post and articles mentioned in the bottom:

Normalization Of Gene Expression Using Rnaseq Rpkm Values

2
Entering edit mode

OP didn't ask for an explanation of quantile normalisation... sharing links can be helpful, but these don't appear to be specifically about this question. If the answer to his question is somewhere on those pages, why don't you give the answer and refer to the pages for further explanation?

2
Entering edit mode
5.2 years ago

If you are analyzing RNA-seq data using limma you should use the voom transformation on the raw counts as described in the user guide, chapter 15. Using RPKM has no place in this analysis, whether log transformed or not.

0
Entering edit mode

I have quantile normalised the log2 data using another method ... and it returned the same results.

0
Entering edit mode

I am sorry but this is hardly an intelligible statement.

0
Entering edit mode

Apologies for the typo.

I have quantile normalised the log2 data using another R package:

>library("preprocessCore")
>x <- normalize.quantiles(my_data_matrix)


This method returns the same results as

>library ("limma")
>y <- normalizeBetweenArrays(my_data_matrix)

0
Entering edit mode
5.2 years ago
ssv.bio ▴ 190

I am not sure why normalizebetweenArrays is used for NGS data (as OP mentioned rpkm data and assuming that rpkms come from NGS data)

As for values, either one should work, as I understand from below line from manual.

Normalizes expression intensities so that the intensities or log-ratios have similar distributions across a set of arrays.


Intensities above mean raw values and log-ratios are in log scale (as per my understanding)