Question: PEER normalization after DESeq run
0
gravatar for tonja.r
3.7 years ago by
tonja.r460
UK
tonja.r460 wrote:

 

 

In PEER-paper they write:

When RNA-seq estimates are used for transcript abundance, we recommend using DESeq to estimate library sizes and variance-stabilize expression data sets.
 

And I got confused by DESeq2 and DESeq.
In DESeq (varianceStabili... depends on the estimateDispersions):

> cdsBlind = estimateDispersions( cds, method="blind" )
> vsd = varianceStabilizingTransformation( cdsBlind )


In DESeq2:

vsd <- varianceStabilizingTransformation(dds)


I am getting the same results from varianceStabilizingTransformation() if I use DESeq functions as DESeq(), estimateSizeFactors(), estimateDispersions() before or not.
 

> dds = DESeqDataSetFromMatrix(countData=histone_m, colData=Design, design=~condition)
>   pik = DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
>   
>   vst=varianceStabilizingTransformation(pik) 
>   head(assay(vst))
      WEN1     WEN2     WNN1     WNN2
1 8.568387 8.206238 7.843942 8.086705
2 8.962147 8.688945 8.320699 8.750248
3 8.135942 7.859804 7.843942 7.878520
4 7.321410 6.863988 7.165757 6.908507
5 6.474284 6.379227 6.420262 6.351544
6 6.561254 6.468225 6.555806 6.861555
>   
>   vst=varianceStabilizingTransformation(dds) 
>   head(assay(vst))
      WEN1     WEN2     WNN1     WNN2
1 8.568387 8.206238 7.843942 8.086705
2 8.962147 8.688945 8.320699 8.750248
3 8.135942 7.859804 7.843942 7.878520
4 7.321410 6.863988 7.165757 6.908507
5 6.474284 6.379227 6.420262 6.351544
6 6.561254 6.468225 6.555806 6.861555
>   
>   
>   cds=estimateSizeFactors(dds)
>   cds = estimateDispersions( cds)
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
>   vst=varianceStabilizingTransformation(cds) 
>   head(assay(vst))
      WEN1     WEN2     WNN1     WNN2
1 8.568387 8.206238 7.843942 8.086705
2 8.962147 8.688945 8.320699 8.750248
3 8.135942 7.859804 7.843942 7.878520
4 7.321410 6.863988 7.165757 6.908507
5 6.474284 6.379227 6.420262 6.351544
6 6.561254 6.468225 6.555806 6.861555

 

So, it seems it does not matter if I use DESeq function before or not. Does it make sense? Also, once I estimate SizeFactors and varianceStabilizingTransformation, how should I plug it into PEER?

 

 

R • 2.0k views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by tonja.r460
1
gravatar for informatics bot
3.7 years ago by
United States
informatics bot570 wrote:

From my understanding, there is negligible difference between the Variance Stabilizing Transformation functions of DESeq and DESeq2. I would suggest you use DESeq2 because it is the most up-to-date version.

 

library(peer)

# assign vst data to variable called expr_set
expr_set = assay(vst)    

# the PEER tutorial recommends k == 25% of “number of samples”  
k = ncol(expr_set) * .25

# create empty peer model
model = PEER()

# set number of “hidden factors” searched for
PEER_setNk(model,k)

# PEER ask NxG matrix, where N=samples and G=genes
PEER_setPhenoMean(model,as.matrix(t(expr_set)))

PEER_setAdd_mean(model, TRUE)

# These can be changed to the tolerance you desire
PEER_setTolerance(model, .001)
PEER_setVarTolerance(model, 0.0005)

# usually doesn’t go over 200 iterations        
PEER_setNmax_iterations(model, 1000)

# run peer
PEER_update(model)

 

ADD COMMENTlink written 3.7 years ago by informatics bot570

Thank you for peer. However, I am still confused about estimated size factors. You did not include it into the peer model and as seen from my example above, it does not matter if I run estimateSizeFactors() or DESeq() before varianceStabilizingTransformation() or not , it will always give me the same result, meaning size factors are not considered.

ADD REPLYlink written 3.7 years ago by tonja.r460

Why do you want to use the "Estimated Size Factors" for PEER?

ADD REPLYlink written 3.7 years ago by informatics bot570

They state in the paper: When RNA-seq estimates are used for transcript abundance, we recommend using DESeq to estimate library sizes. I guess they are referring to the effective library size e.g. to the normalization of the counts in order to account for the sequencing depth. And it can be done by estimateSizeFactors and then one can obtain the normalized counts by  o=counts(cds,normalized=TRUE). But in the downstream analysis of DESeq, sizeFactors are used intern to normalize the counts. And in fact I would like to account for the sequencing depth because my libraries are quite different.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by tonja.r460
1

The Bioconductor RNAseq Workflow explicitly states, "Sequencing depth correction is done automatically for the rlog method (and for varianceStabilizingTransformation)." My point being, VST already corrects for different sequencing depths between libraries, you do not need to account for it in PEER.

ADD REPLYlink written 3.7 years ago by informatics bot570

one question more: the corrected data (normalized counts) set (for downstream analysis) I can obtain by PEER_getResiduals(model) as stated in the example, can't I? They seem to be centered at zero (in the example, simple unsupervised_demo) and at 0.1 in my case. 

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by tonja.r460

Yeah, PEER_getResiduals() gives you the "corrected normalized counts," which is what I commonly use for downstream analysis. I believe this Nature Protocol might be helpful to you:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3398141/

ADD REPLYlink written 3.7 years ago by informatics bot570
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1615 users visited in the last hour