Question

RNA-seq read count data normalizaiton using housekeeping genes

6

Entering edit mode

9.6 years ago

wangyang703092 ▴ 120

I have a set of RNA-seq data, and I used cuffdiff, deseq2 and edgeR to do the normalization. However, the expression level of the housekeeping genes seems unstable during different time points, i.e. have great fold-change. I am wondering if there is any solutions to normalize the expression matrix using a set of housekeeping gene row count as reference since they should express stably. Should I do the normalization work manually or there is some package I can use, or adjust the packages i mentioned before to make it work?

normalization RNA-Seq • 9.6k views

ADD COMMENT • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by wangyang703092 ▴ 120

Ram · Answer 1 · 2014-10-26

5

Entering edit mode

9.6 years ago

Devon Ryan 104k

The question becomes how stably expressed the house keeping genes actually are (they tend to be less stable than advertised). The simplest way to do what you mentioned is to subset the DESeqDataSet or DGElist by the house keeping genes, normalize that, and apply the resulting normalization factors back to the full dataset.

ADD COMMENT • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks a lot. Actually, as you say, the housekeeping genes' expression level may not be that stable. And the way you mentioned sound good. Because I don't use DEseq2 proficiently ,could you please give me more detailed process about "subset the DESeqDataSet or DGElist by the house keeping genes, normalize that, and apply the resulting normalization factors back to the full dataset". Thank you again!

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by wangyang703092 ▴ 120

1

Entering edit mode

You just need to make a list of IDs associated with your house keeping genes. I think DGElists and DESeqDataSet objects have row.names() accessors, so just subset things with %in%.

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

Do you mean that I should leave the dataset except housekeeping gene behind, and just normalize the housekeeping genes, then I will get a normalization factor, finally use the factor to normalize the full dataset?

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by wangyang703092 ▴ 120

0

Entering edit mode

yes, exactly

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

Ok, will DEseq2 give me the normalization factor or calculate it manually?

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by wangyang703092 ▴ 120

0

Entering edit mode

Just use the estimateSizeFactors() function.

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by Devon Ryan 104k

1

Entering edit mode

Sorry to bother you again, the estimateSizeFactors() has the controlGenes argument,and the Reference Manual lists the examples, such as dds <- estimateSizeFactors(dds, controlGenes=1:200), do you know what 1:200 mean?

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by wangyang703092 ▴ 120

0

Entering edit mode

That's new, but quite convenient, since it saves the whole subsetting process. 1:200 means that the genes to use for normalization are the first 200 in the dataset. It's just a vector of indices.

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

But when I do what the Manual's example does:

> dds <- makeExampleDESeqDataSet(n=1000, m=12)
> dds <- estimateSizeFactors(dds, controlGenes=1:200)
Error in .local(object, ...) : unused argument (controlGenes = 1:200)

do you know what's wrong with that

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by wangyang703092 ▴ 120

0

Entering edit mode

The the version specified in the manual and the one you have locally aren't the same.

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 9.6 years ago by Devon Ryan 104k