Question: fold change vs standardized effect size
5
4.5 years ago by
mvlombardo50
mvlombardo50 wrote:

Hi all,

Just a quick question (as a newcomer to bioinformatics) regarding effect size in differential expression analysis.  Why does the field opt for using fold change as a metric of effect size?  Fold change doesn't take into account variability, whereas standardized effect size measures like Cohen's d do.  So why doesn't the field report effect sizes that take into account variability?

To illustrate an example, say gene X has a mean of 7.20 in condition A and 7.60 in condition B.  Fold change for condition B compared to condition A is 7.60/7.20 = 1.05.  Say the standard deviation estimates on condition A is 0.09, while in condition B its  0.10.  Computing Cohen's d on this, the effect size is somewhere around 4.2, which is a gigantic effect.  Fold change and Cohen's d differ dramatically, so why not report effect size estimates that take into account variability rather than fold change?

Thanks,

Mike

rna-seq • 3.5k views
modified 4.5 years ago by Michael Love1.9k • written 4.5 years ago by mvlombardo50
3
4.5 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

Historical inertia mostly.

Having said that, there's a slowly growing movement within bio-related fields to include confidence intervals in all estimate reports. While those obviously aren't standardized effect sizes, they serve a similar purpose (e.g., you can use them for power calculations, which more people should actually do). Actually, I find confidence intervals more useful than standardized effect sizes, but perhaps that's just me.

I should add that standardized effect sizes are not a panacea. My original background is ion channel biophysics, where we can robustly measure very very very very small changes in an extremely robust manner. I could trivially get Cohen's D values vastly larger than the one you showed, but the results would still be biologically meaningless. That's the thing with relying too much on any single number (this goes doubly for p-values), they're easy to point at and yell, "This is important!", when a finding isn't actually.

Hi Devon,

Thanks for the quick answer on this.  I would agree that confidence intervals are quite useful pieces of information and would probably go well with something like Cohen's d.  And couldn't agree more when it comes to p-values.

Mike

3
4.5 years ago by
Michael Love1.9k
United States
Michael Love1.9k wrote:

DESeq2's posterior log fold changes are "reliable" effect sizes, that is, directly comparable across experiments, because the fold changes from genes with less information (low counts, high variability) are moderated toward zero using Bayes theorem. We lay out the argument in our paper here: http://genomebiology.com/2014/15/12/550 . We also provide Wald statistics in the results table, but this is not exactly what you are asking for (dividing by SE of the estimate, not SD of the data). : you could use the expected variance formula for log counts to add your standardized effect size: V = 1/mean + dispersion. So divide log fold change by sqrt(1/mu + dispersion), where mu is the mean of normalized counts for the gene.

ADD COMMENTlink modified 4.4 years ago • written 4.5 years ago by Michael Love1.9k

Hi @Michael! I see this is an old post but do you suggest to directly use the LFC values after applying lfcshrinkage() with apeglm method or do you still suggest to divide these LFC by sqrt(1/mu + dispersion)? In case its the latter, is the mu for each gene denoted by the column basemean and is the dispersion value extracted using dispersionFunction() or is it the lfcSE column? Many thanks!