Question: Why Does Plotting Log Fold Change Threshold Against Number Of Differentially Expressed Genes Result In A Sigmoid Curve?
gravatar for pollyD
6.1 years ago by
Russia/St. Petersburg/Saint Petersburg State University
pollyD30 wrote:

Hi all,

I'm working with RNA microarrays, and I'm a newbie to the field. I processed the results with the limma package for Bioconductor.

I'm trying to decide which fold change threshold I should use. I plotted logFC threshold against the number of differentially expressed genes. I have three conditions, so I got three curves **plot** here which look very much like reverse sigmoid curves. I've asked colleagues and was told that good quality data usually show such curves but it doesn't imply anything.

So, could use please explain why do they look this way or where could I find any explanation? Then, is it possible to use this data to define log fold change threshold? Do plateau, transition and exponential phase have any biological meaning?

Thanks in advance!

bioconductor rna microarray • 3.3k views
ADD COMMENTlink modified 5.7 years ago by Biostar ♦♦ 20 • written 6.1 years ago by pollyD30

The curves look as expected IMHO, because you are ignoring the sign of the log-fold change. Most genes marginally change expression thats why you see a plateau between 0 and 1. If you plot sorted log-fold changes with sign, you'll see a curve like this, a gaussian curve if you plot the histogram.You can choose typical joint fold change and p-value cut off, say abs(FC)= 1.5X and p-value 0.05

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Woa2.7k

Thanks for the idea, but it doesn't explain what I have. If I do as you suggest, I will get another representation of the volcano plot, won't I?

I might have misled you with the axis label (sorry!). It'a logFC threshold. So, the output is the number of genes which are regarded as differentially expressed at this threshold, a kind of cumulative variable. If I plot upregulated and downregulated genes separately, I get similar curves.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by pollyD30

For me its difficult to tell WHY the #of differential proteins vs. a cut-off FC gives such a curve, but my guess is 'central part' of the log-FC histogram comes from a Gaussian model and the ' tails' , which contain differential proteins follow some kind of heavy tailed distribution. People tried to model gene/protein expression using mixture models viz. Gaussian combined with Generalized Pareto.

ADD REPLYlink written 6.1 years ago by Woa2.7k

Well, I definitely have to learn more about the methods I'm using.

Thanks for the link!

ADD REPLYlink written 6.1 years ago by pollyD30
gravatar for Woa
6.1 years ago by
United States
Woa2.7k wrote:

BTW, how you are defining "differentially expressed" at a given threshold? A gaussian data would produce similar curve, however the shoulder is less broad

rm(list=ls()) <- rnorm(5000,0,0.3)*4.0
summary (
thres <-seq(0.5,4,0.5)
my.sum <-rep(NA,length(thres))

for ( i in 1:length(thres) ){
    my.sum[i] <- length([abs( > thres[i]])


ADD COMMENTlink written 6.1 years ago by Woa2.7k

I define differentially expressed genes as those having absolute log FC > 0.5 and p < 0.001 (in this case). p value comes from the the linear model and were subjected to Benjamini-Hochberg correction.

ADD REPLYlink written 6.1 years ago by pollyD30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour