How to get normalized count table from DESeq?
1
0
Entering edit mode
5 weeks ago
leranwangcs ▴ 60

Hi,

I'm using Deseq compare differential abundance. Here is my code:

ds.all <- phyloseq_to_deseq2(ps0.infant.pbs, ~ sample_type)

geoMeans <- apply(counts(ds.all),1,gm_mean)

ds.all <- estimateSizeFactors(ds.all,geoMeans = geoMeans)

dds.all <- DESeq(ds.all,fitType = "local")


Then as the results I got 8 ASVs that showed significantly different.

My questions:

1. I used geoMeans is because otherwise DESeq() would fail with error:

 Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc,  :


every gene contains at least one zero, cannot compute log geometric means

So does that mean these two steps:

   geoMeans <- apply(counts(ds.all),1,gm_mean)

ds.all <- estimateSizeFactors(ds.all,geoMeans = geoMeans)


are normalization steps of DESeq? Or there are other hidden normalization steps in DESeq?

1. How can I extract the normalized count table that DESeq used to generate the 8 ASVs? I tried counts(ds.all), but the count table is exactly the same with the raw count table.

2. I also tried counts(dds, normalized=T), it does look like a normalized count table, but how can I know if this is the exact normalized count table that deseq used for its analysis?

Thanks! Leran

DESeq • 353 views
0
Entering edit mode

I take it that for this application, it's okay for every gene to have a zero? Because in the regular bulk seq RNA that DESeq was designed for, that's not typical unless you have a couple of failed samples included.

0
Entering edit mode

phyloSeq manual instructs to do exactly that so... ¯_(ツ)_/¯

3
Entering edit mode
5 weeks ago

something like this should work:

 Normalized <-counts(dds.all, normalized=TRUE)

0
Entering edit mode

Thanks! I know that there are multiple normalization methods contained in DESeq, how can I know if Normalized <-counts(dds.all, normalized=TRUE) gives me the exact same count table that DESeq used for DE? And what does the geoMeans do here?

Thanks!

0
Entering edit mode

The geometric Means are an attempt to dampen the effect of outliers/genes that are different between samples in order to find a factor that balances the two samples at a neutral baseline. See for details:+