I have a set of RNA-seq data, and i used cuffdiff,deseq2 and edgeR to do the normalization. However, the expression level of the housekeeping genes seems unstable during different time points,ie. have great fold-change.I am wondering if there is any solutions to normalize the expression matrix using a set of housekeeping gene row count as reference since they should express stably.Should i do the normalization work manually or there is some package i can use, or adjust the packages i mentioned before to make it work ?
The question becomes how stably expressed the house keeping genes actually are (they tend to be less stable than advertised). The simplest way to do what you mentioned is to subset the DESeqDataSet or DGElist by the house keeping genes, normalize that, and apply the resulting normalization factors back to the full dataset.
Thanks a lot.Actually, as you say, the housekeeping genes' expression level may not be that stable.And the way you mentioned sound good.Because i don't use DEseq2 proficiently ,could you please give me more detailed process about "subset the DESeqDataSet or DGElist by the house keeping genes, normalize that, and apply the resulting normalization factors back to the full dataset". Thank you again!
You just need to make a list of IDs associated with your house keeping genes. I think DGElists
and DESeqDataSet
objects have row.names()
accessors, so just subset things with %in%
.
do you mean that i should leave the dataset except housekeeping gene behind,and just normalize the housekeeping genes,then i will get a normalization factor, finally use the factor to normalize the full dataset?
Ok, will DEseq2 give me the normalization factor or calculate it manually ?
Just use the estimateSizeFactors() function.
Sorry to bother you again, the estimateSizeFactors()
has the controlGenes
argument,and the Reference Manual lists the examples, such as dds <- estimateSizeFactors(dds, controlGenes=1:200) ,
do you know what 1:200
mean?
That's new, but quite convenient, since it saves the whole subsetting process. 1:200 means that the genes to use for normalization are the first 200 in the dataset. It's just a vector of indices.
But when i do what the Manual's example does:
> dds <- makeExampleDESeqDataSet(n=1000, m=12) > dds <- estimateSizeFactors(dds, controlGenes=1:200) Error in .local(object, ...) : unused argument (controlGenes = 1:200)
do you know what's wrong with that