Question

Manually calculating log2 fold change values from DESeq2 normalized counts

0

Entering edit mode

16 months ago

jayeshkumarsundaram • 0

I need to calculate log2 fold change values for lot of different experimental conditions when compared to their corresponding controls. Just to mention, I am not going to use these for differential expression analysis but for some other downstream analysis like clustering and stuff. Traditionally in my field, counts are normalized by TPM method and then fold change values are calculated by log2(TPM_exp+1)-log2(TPM_control+1) [using 1 or 0.5 as pseudo counts for log transformation]. In my case, I realized that TPM is not a good way to normalize the data as I have few samples with lot of reads mapping to only one or two genes [RNA composition bias]. DESeq2 median of ratios normalization seems to take care of that issue. So, I prefer using DESeq2 normalization. But I cannot use DESeq2 for getting log2 fold change values because I don't have replicates for some of the experimental conditions and DESeq2 needs replicates to estimate log2 fold change values. So, I want to manually calculate log2 fold change values from DESeq2 normalized counts. So, I am using log2(DESeq2norm_exp+0.5)-log2(DESeq2norm_control+0.5) for calculating log2 fold change values. I am not sure whether it is a good idea or the choice of pseudo-count here is very critical. Any comments or help is really appreciated.

Differential-gene-expression Log2FC DESeq2 • 4.1k views

ADD COMMENT • link updated 16 months ago by ATpoint 82k • written 16 months ago by jayeshkumarsundaram • 0

score 0 · Answer 1 · 2022-12-29

0

Entering edit mode

16 months ago

ATpoint 82k

You do not need replication in every group, only in at least one to run the fold change calculation. While this is obviously not a reliable way of doing the analysis, using the fold changes from the standard pipeline including the normalization even for unreplicated contrasts is probably still better than any naive approaches. Note though that fold changes without pvalues are not helpful as large fold changes with equally large standard errors have essentially no meaning. I would try to do clustering (or any downstream analysis) on counts (for example the vst transformation from DESeq2) if at all possible. Fold changes probably don't help here.

Example:

suppressMessages(library(DESeq2))

set.seed(1)
dds <- makeExampleDESeqDataSet(m=4)
dds$condition <- factor(c("A", "A", "B", "C"))
dds <- DESeq(dds)
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
colData(dds)
#> DataFrame with 4 rows and 2 columns
#>         condition sizeFactor
#>          <factor>  <numeric>
#> sample1         A    1.06893
#> sample2         A    1.01469
#> sample3         B    1.01039
#> sample4         C    1.03356

res1 <- results(dds, contrast = c("condition", "A", "B"))
res2 <- results(dds, contrast = c("condition", "A", "C"))
res3 <- results(dds, contrast = c("condition", "B", "C"))
Created on 2022-12-29 with reprex v2.0.2

ADD COMMENT • link 16 months ago by ATpoint 82k

0

Entering edit mode

Thanks AT point for your reply. I don't want to use counts data for clustering because it still has some technical or study related bias. I believe by calculating log fold change values, I can get rid of the bias to some extent. I think it is worth trying to perform VST() on counts data because I believe it performs median ratio normalization before variance stabilization transformation. This also I believe takes care of the inflation of fold change values with smaller number of counts. Do you think mu understanding is correct?

ADD REPLY • link 16 months ago by jayeshkumarsundaram • 0

0

Entering edit mode

I do not think that any stats magic is going to compensate for the lack of replication and presence of unwanted technical variation, the latter unless you can meaningfully regress that and in case the experimental design allows it. I personally would probably try to use the lfcShrink() method to correct the fold changes from DESeq2 (in fact that is a major point of the method, see vignette, e.g. with the "ashr" method), and go along with that if you really need fold changes. I do not know though how reliable shrunken fold change estimates are without replication. The problem with any non-standard method is that you lack ground truth to benchmark against. I would therefore stick with what DESeq2 provides out of the box, that is not always better (here I think it is), but at least established and automated.

ADD REPLY • link 16 months ago by ATpoint 82k

0

Entering edit mode

Cross-posted...

https://bioinformatics.stackexchange.com/questions/20273/manually-calculating-log2-fold-change-values-from-deseq2-normalized-counts

https://support.bioconductor.org/p/9148604/

ADD REPLY • link 16 months ago by ATpoint 82k