Question

How to evaluate different RNA-seq normalization method?

0

Entering edit mode

4.1 years ago

dz2353 ▴ 120

Hi there,

Is there any standard or strategy to evaluate the normalization method of reads count from RNA-seq data? I would like to compare the same gene expression level between different samples. I know I do not need to take care of the exon length, just normalize the sequencing depth. I choose some methods including CPM, UP, TMM, DESeq2 and deconvolution methods. But I do not know how to evaluate them? I plot some statistical elements like the coefficient of variance vs mean, variance. But it seems there is not too much difference. So I was wondering if there any way to help me understand which method is best? Thank you in advance for any answer, idea, and suggestion.

RNA-Seq • 1.3k views

ADD COMMENT • link updated 4.1 years ago by Kevin Blighe 87k • written 4.1 years ago by dz2353 ▴ 120

1

Entering edit mode

I suggest you read one of the many benchmarking papers which compare these methods, available via PubMed. As usual there is no "best". TMM and RLE (the one from DESeq2) typically perform comparable and well. Honestly I would not spend too much time on these comparisons as benchmarking is an art of its own and you really need a sophisticated setup to extract meaningful information. Check available papers if you really want to repeat these evaluations. Better spend time on the interpretation of the results than on benchmarking yourself. Both edgeR and DESeq2 are perfectly fine and accepted for RNA-seq. What you should ask yourself is if your data violate the assumption of the normalization which is that a large number of genes does not change between conditions.

ADD REPLY • link 4.1 years ago by ATpoint 81k

0

Entering edit mode

Thank you for your reply. I agree with you. There is no need to take too much time on normalization. For the assumption you mentioned, can I understand it as most of the genes between samples show similar expression levels?

ADD REPLY • link 4.1 years ago by dz2353 ▴ 120

0

Entering edit mode

The median ratio normalization in DESeq2 doesn't have as strong of an assumption that "most genes don't change". See Michael Love's post on https://support.bioconductor.org/p/61604/

But, that said, if most genes indeed show similar expression levels between samples, median ratio works well in capturing the size factor differences between samples.

ADD REPLY • link 4.1 years ago by dsull ★ 5.8k

score 0 · Answer 1 · 2020-03-25

0

Entering edit mode

4.1 years ago

Kevin Blighe 87k

Somewhat old, but still relevant, please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

Kevin

ADD COMMENT • link 4.1 years ago by Kevin Blighe 87k