Question: RNA-seq read count data normalizaiton using housekeeping genes
5
gravatar for wangyang703092
6.2 years ago by
China
wangyang703092110 wrote:

I have a set of RNA-seq data, and i used cuffdiff,deseq2 and edgeR to do the normalization. However, the expression level of the housekeeping genes seems unstable during different time points,ie. have great fold-change.I am wondering if there is any solutions to normalize the expression matrix using a set of housekeeping gene row count as reference since they should express stably.Should i do the normalization work manually or there is some package i can use, or adjust the packages i mentioned before to make it work ?

rna-seq normalization • 7.9k views
ADD COMMENTlink modified 6.2 years ago by Devon Ryan98k • written 6.2 years ago by wangyang703092110
5
gravatar for Devon Ryan
6.2 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

The question becomes how stably expressed the house keeping genes actually are (they tend to be less stable than advertised). The simplest way to do what you mentioned is to subset the DESeqDataSet or DGElist by the house keeping genes, normalize that, and apply the resulting normalization factors back to the full dataset.

ADD COMMENTlink written 6.2 years ago by Devon Ryan98k

Thanks a lot.Actually, as you say, the housekeeping genes' expression level may not be that stable.And the way you mentioned sound good.Because i don't use DEseq2 proficiently ,could you please give me more detailed process about "subset the DESeqDataSet or DGElist by the house keeping genes, normalize that, and apply the resulting normalization factors back to the full dataset". Thank you again!

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by wangyang703092110
1

You just need to make a list of IDs associated with your house keeping genes. I think DGElists and DESeqDataSet objects have row.names() accessors, so just subset things with %in%.

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by Devon Ryan98k

do you mean that i should leave the dataset except housekeeping gene behind,and just normalize the housekeeping genes,then i will get a normalization factor, finally use the factor to normalize the full dataset?

 

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by wangyang703092110

yes, exactly

ADD REPLYlink written 6.2 years ago by Devon Ryan98k

Ok, will DEseq2 give me the normalization factor or calculate it manually ?

ADD REPLYlink written 6.2 years ago by wangyang703092110

Just use the estimateSizeFactors() function.
 

ADD REPLYlink written 6.2 years ago by Devon Ryan98k
1

Sorry to bother you again, the estimateSizeFactors() has the controlGenes argument,and the Reference Manual lists the examples, such as dds <- estimateSizeFactors(dds, controlGenes=1:200) ,​do you know what 1:200 mean?

 

 

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by wangyang703092110

That's new, but quite convenient, since it saves the whole subsetting process. 1:200 means that the genes to use for normalization are the first 200 in the dataset. It's just a vector of indices.

ADD REPLYlink written 6.2 years ago by Devon Ryan98k

But when i do what the Manual's example does:

> dds <- makeExampleDESeqDataSet(n=1000, m=12)
> dds <- estimateSizeFactors(dds, controlGenes=1:200)
Error in .local(object, ...) : unused argument (controlGenes = 1:200)

do you know what's wrong with that

ADD REPLYlink written 6.2 years ago by wangyang703092110

The the version specified in the manual and the one you have locally aren't the same.

ADD REPLYlink written 6.2 years ago by Devon Ryan98k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1047 users visited in the last hour
_