Question

Clarification For Rna-Seq Normalization

6

Entering edit mode

12.7 years ago

Assa Yeroslaviz ★ 1.8k

Hi everybody,

I read a lot in the last few days about the different opinions to rna-seq normalization methods. To be honest I'm quite a bit confused at the moment and so I would like to ask for your help to try and clarify me about how to use what kind of normalization method.

I'm sure that there is no straightforward answer for such a question but I would really appreciate contradictory opinions if it will help for other users also to explain the problem.

As far as I understand it there is no "standard" method for normalizing methods.

We have one rna-seq experiment with each only one set for control and one set for treatment. Albeit the fact of insignificance regarding the lack of replicates, I would like to understand how to work in general with rna-seq data.

we would like to look into both differential expression and differences in splice variants between the two conditions. I have read opinion about how to normalize the data in best way for identifying differentially expressed genes and for identifying isoforms. Apparently these two goals should be analysed differently. The best example for that was the discussion between Simon and lpachter about when to normalize how here: http://seqanswers.com/forums/showthread.php?t=586&page=1

I think it shows how controverse this can be. I was interested in this discussion, though it is quite an old one and a lot have changed probably.

RPKM measure the relative level of gene expression between experiments, but appearently some people are against it, due to certain biases, which it can't compensate. In the posting above, Simon mentions DESeq (EdgeR), which suppose to work better for differential expression

So my questions are:

Will it be better to normalize the data twice separately for the two goals
Does it make sense to normalize data one time after the other?
Can I relay on cuffdiff/cuffcompare to give me a good estimation on the splice variants and on DESeq/SDEGSeq to give me a good estimation about the differentially expressed genes?

I would appreciate every comment or discussion.

Thanks A.

rna data • 4.7k views

ADD COMMENT • link updated 12.7 years ago by Ido Tamir 5.2k • written 12.7 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

Could you clarify your second question a bit? What do you mean by "one time after the other?" Do you mean to ask whether it makes sense to apply tow normalization methods sequentially?

ADD REPLY • link 12.7 years ago by Chris Evelo 10k

0

Entering edit mode

yes exactly. I know it from earlier microarrays experiments, that doing two sequent normalizations will shift the values more. It can be good, but not necessarily. So is it a good idea to run here two sequent normalization procedures, or is it better to run the two analyses completely separate from each other?

ADD REPLY • link 12.7 years ago by Assa Yeroslaviz ★ 1.8k

score 2 · Answer 1 · 2011-08-05

On the question of how to combine different normalization methods I cant give you an answer beside that you probably will violate the input assumptions of the second method.

The discussions on normalization has moved on a lot and it has been shown cqn that you could have e.g. sample specific (not gene specific!) GC-bias that you could correct for and for which RPKM or global scaling is not enough. cqn also discusses briefly and gives reference to other normalization methods.

So first check if you find biases e.g. GC/RPKM (sample specific different) etc... in your data and then decide if you need to apply normalizations.

And you need biological replicates.