Normalizing Count Data In Rna-Seq
2
1
Entering edit mode
11.9 years ago
Arun 2.4k

Hello, Suppose I have RNA-seq data for 1) control, say, T0 2) treatment after 4 hours T4 3) treatment after 8 hours T8 and I would like to find out those genes that are differentially expressed between each of these pairs (where T0 vs T4 and T0 vs T8 are most informative/essential to the experimenter).

I perform normalization using edgeR TMM method. However, the way I have been doing it is to normalize count data for each pair (A). That is, for T0 vs T4, I obtain the counts and then perform the TMM normalization and then obtain the candidate genes and then for T0 vs T8, once again do normalization between these two count data and obtain DE genes and so on...

However I am beginning to wonder if this is the way to go or to perform only one normalization by having counts from all genes from all time points altogether (B).

I am not able to convince myself of a good reason to choose between either. Have anyone of you had to work on this type of data or have an idea why you would go for (A) or (B)?

Thank you.

edger rna-seq differential-expression • 5.6k views
ADD COMMENT
2
Entering edit mode
11.9 years ago
Frenkiboy ▴ 250

You can try the DESeq package, It's estimateSizeFactors uses the complete dataset to perform the normalization.

Then you can test for differential expression on sample vs sample, or fit a GLM.

ADD COMMENT
0
Entering edit mode

Thank you for your answer. However, I don't think the issue is if edgeR has the option to do normalization on all/more than two samples. Rather, which one is better / right? Doing normalization for each pair as and when I test for DE or normalize them all altogether and then test for DE on all pairs. But from what you say, it seems like normalization and then DE on all pairs. Right?

ADD REPLY
1
Entering edit mode

I think you have it right, yes.

ADD REPLY
1
Entering edit mode
11.9 years ago
seidel 11k

The problem with option A, is that you calculate different normalization factors between T0 and T4, and between T0 and T8. Inevitably, since T4 and T8 are related samples from the same time course, you'll likely be comparing the results between T4 and T8, but they will have been adjusted differently, so they will differ by this factor. With option B, everything in the pool has been adjusted to the same mean.

ADD COMMENT
0
Entering edit mode

Got it. I had to make sure! :)

ADD REPLY

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6