Question: Why Does Plotting Log Fold Change Threshold Against Number Of Differentially Expressed Genes Result In A Sigmoid Curve?
0
6.1 years ago by
pollyD30
Russia/St. Petersburg/Saint Petersburg State University
pollyD30 wrote:

Hi all,

I'm working with RNA microarrays, and I'm a newbie to the field. I processed the results with the limma package for Bioconductor.

I'm trying to decide which fold change threshold I should use. I plotted logFC threshold against the number of differentially expressed genes. I have three conditions, so I got three curves which look very much like reverse sigmoid curves. I've asked colleagues and was told that good quality data usually show such curves but it doesn't imply anything.

So, could use please explain why do they look this way or where could I find any explanation? Then, is it possible to use this data to define log fold change threshold? Do plateau, transition and exponential phase have any biological meaning?

bioconductor rna microarray • 3.3k views
modified 5.7 years ago by Biostar ♦♦ 20 • written 6.1 years ago by pollyD30
2

The curves look as expected IMHO, because you are ignoring the sign of the log-fold change. Most genes marginally change expression thats why you see a plateau between 0 and 1. If you plot sorted log-fold changes with sign, you'll see a curve like this http://en.wikipedia.org/wiki/File:Logistic-curve.svg.Or, a gaussian curve if you plot the histogram.You can choose typical joint fold change and p-value cut off, say abs(FC)= 1.5X and p-value 0.05

Thanks for the idea, but it doesn't explain what I have. If I do as you suggest, I will get another representation of the volcano plot, won't I?

I might have misled you with the axis label (sorry!). It'a logFC threshold. So, the output is the number of genes which are regarded as differentially expressed at this threshold, a kind of cumulative variable. If I plot upregulated and downregulated genes separately, I get similar curves.

For me its difficult to tell WHY the #of differential proteins vs. a cut-off FC gives such a curve, but my guess is 'central part' of the log-FC histogram comes from a Gaussian model and the ' tails' , which contain differential proteins follow some kind of heavy tailed distribution. People tried to model gene/protein expression using mixture models viz. Gaussian combined with Generalized Pareto. http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007454

0
6.1 years ago by
Woa2.7k
United States
Woa2.7k wrote:

BTW, how you are defining "differentially expressed" at a given threshold? A gaussian data would produce similar curve, however the shoulder is less broad

``````rm(list=ls())
my.data <- rnorm(5000,0,0.3)*4.0
summary (my.data)
hist(my.data)
thres <-seq(0.5,4,0.5)
my.sum <-rep(NA,length(thres))

for ( i in 1:length(thres) ){
my.sum[i] <- length(my.data[abs(my.data) > thres[i]])

}

plot(thres,my.sum,pch=20,cex=2.0,col="hotpink")
lines(thres,my.sum,,col="blue",lty=4,lwd=1)
``````