I have a set of RNA-seq data, and I used cuffdiff, deseq2 and edgeR to do the normalization. However, the expression level of the housekeeping genes seems unstable during different time points, i.e. have great fold-change. I am wondering if there is any solutions to normalize the expression matrix using a set of housekeeping gene row count as reference since they should express stably. Should I do the normalization work manually or there is some package I can use, or adjust the packages i mentioned before to make it work?
Thanks a lot. Actually, as you say, the housekeeping genes' expression level may not be that stable. And the way you mentioned sound good. Because I don't use DEseq2 proficiently ,could you please give me more detailed process about "subset the DESeqDataSet or DGElist by the house keeping genes, normalize that, and apply the resulting normalization factors back to the full dataset". Thank you again!
You just need to make a list of IDs associated with your house keeping genes. I think
DGElists
andDESeqDataSet
objects haverow.names()
accessors, so just subset things with%in%
.Do you mean that I should leave the dataset except housekeeping gene behind, and just normalize the housekeeping genes, then I will get a normalization factor, finally use the factor to normalize the full dataset?
yes, exactly
Ok, will DEseq2 give me the normalization factor or calculate it manually?
Just use the
estimateSizeFactors()
function.Sorry to bother you again, the
estimateSizeFactors()
has thecontrolGenes
argument,and the Reference Manual lists the examples, such asdds <- estimateSizeFactors(dds, controlGenes=1:200)
, do you know what1:200
mean?That's new, but quite convenient, since it saves the whole subsetting process. 1:200 means that the genes to use for normalization are the first 200 in the dataset. It's just a vector of indices.
But when I do what the Manual's example does:
do you know what's wrong with that
The the version specified in the manual and the one you have locally aren't the same.