Question: fold change vs standardized effect size
gravatar for mvlombardo
5.7 years ago by
mvlombardo70 wrote:

Hi all,

Just a quick question (as a newcomer to bioinformatics) regarding effect size in differential expression analysis.  Why does the field opt for using fold change as a metric of effect size?  Fold change doesn't take into account variability, whereas standardized effect size measures like Cohen's d do.  So why doesn't the field report effect sizes that take into account variability?

To illustrate an example, say gene X has a mean of 7.20 in condition A and 7.60 in condition B.  Fold change for condition B compared to condition A is 7.60/7.20 = 1.05.  Say the standard deviation estimates on condition A is 0.09, while in condition B its  0.10.  Computing Cohen's d on this, the effect size is somewhere around 4.2, which is a gigantic effect.  Fold change and Cohen's d differ dramatically, so why not report effect size estimates that take into account variability rather than fold change?



rna-seq • 4.9k views
ADD COMMENTlink modified 5.7 years ago by Michael Love2.1k • written 5.7 years ago by mvlombardo70
gravatar for Devon Ryan
5.7 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

Historical inertia mostly.

Having said that, there's a slowly growing movement within bio-related fields to include confidence intervals in all estimate reports. While those obviously aren't standardized effect sizes, they serve a similar purpose (e.g., you can use them for power calculations, which more people should actually do). Actually, I find confidence intervals more useful than standardized effect sizes, but perhaps that's just me.

I should add that standardized effect sizes are not a panacea. My original background is ion channel biophysics, where we can robustly measure very very very very small changes in an extremely robust manner. I could trivially get Cohen's D values vastly larger than the one you showed, but the results would still be biologically meaningless. That's the thing with relying too much on any single number (this goes doubly for p-values), they're easy to point at and yell, "This is important!", when a finding isn't actually.

ADD COMMENTlink written 5.7 years ago by Devon Ryan98k

Hi Devon,

Thanks for the quick answer on this.  I would agree that confidence intervals are quite useful pieces of information and would probably go well with something like Cohen's d.  And couldn't agree more when it comes to p-values.


ADD REPLYlink written 5.7 years ago by mvlombardo70
gravatar for Michael Love
5.7 years ago by
Michael Love2.1k
United States
Michael Love2.1k wrote:

DESeq2's posterior log fold changes are "reliable" effect sizes, that is, directly comparable across experiments, because the fold changes from genes with less information (low counts, high variability) are moderated toward zero using Bayes theorem. We lay out the argument in our paper here: . We also provide Wald statistics in the results table, but this is not exactly what you are asking for (dividing by SE of the estimate, not SD of the data). [Edit]: you could use the expected variance formula for log counts to add your standardized effect size: V = 1/mean + dispersion. So divide log fold change by sqrt(1/mu + dispersion), where mu is the mean of normalized counts for the gene.

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by Michael Love2.1k

Hi @Michael! I see this is an old post but do you suggest to directly use the LFC values after applying lfcshrinkage() with apeglm method or do you still suggest to divide these LFC by sqrt(1/mu + dispersion)? In case its the latter, is the mu for each gene denoted by the column basemean and is the dispersion value extracted using dispersionFunction() or is it the lfcSE column? Many thanks!

ADD REPLYlink written 16 months ago by manikg140
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1635 users visited in the last hour