Question: Normalizing Count Data In Rna-Seq
gravatar for Arun
8.2 years ago by
Arun2.3k wrote:

Hello, Suppose I have RNA-seq data for 1) control, say, T0 2) treatment after 4 hours T4 3) treatment after 8 hours T8 and I would like to find out those genes that are differentially expressed between each of these pairs (where T0 vs T4 and T0 vs T8 are most informative/essential to the experimenter).

I perform normalization using edgeR TMM method. However, the way I have been doing it is to normalize count data for each pair (A). That is, for T0 vs T4, I obtain the counts and then perform the TMM normalization and then obtain the candidate genes and then for T0 vs T8, once again do normalization between these two count data and obtain DE genes and so on...

However I am beginning to wonder if this is the way to go or to perform only one normalization by having counts from all genes from all time points altogether (B).

I am not able to convince myself of a good reason to choose between either. Have anyone of you had to work on this type of data or have an idea why you would go for (A) or (B)?

Thank you.

ADD COMMENTlink modified 8.2 years ago by seidel7.1k • written 8.2 years ago by Arun2.3k
gravatar for Frenkiboy
8.2 years ago by
Frenkiboy250 wrote:

You can try the DESeq package, It's estimateSizeFactors uses the complete dataset to perform the normalization.

Then you can test for differential expression on sample vs sample, or fit a GLM.

ADD COMMENTlink written 8.2 years ago by Frenkiboy250

Thank you for your answer. However, I don't think the issue is if edgeR has the option to do normalization on all/more than two samples. Rather, which one is better / right? Doing normalization for each pair as and when I test for DE or normalize them all altogether and then test for DE on all pairs. But from what you say, it seems like normalization and then DE on all pairs. Right?

ADD REPLYlink written 8.2 years ago by Arun2.3k

I think you have it right, yes.

ADD REPLYlink written 8.2 years ago by Sean Davis26k
gravatar for seidel
8.2 years ago by
United States
seidel7.1k wrote:

The problem with option A, is that you calculate different normalization factors between T0 and T4, and between T0 and T8. Inevitably, since T4 and T8 are related samples from the same time course, you'll likely be comparing the results between T4 and T8, but they will have been adjusted differently, so they will differ by this factor. With option B, everything in the pool has been adjusted to the same mean.

ADD COMMENTlink written 8.2 years ago by seidel7.1k

Got it. I had to make sure! :)

ADD REPLYlink written 8.2 years ago by Arun2.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1103 users visited in the last hour