Why Does Plotting Log Fold Change Threshold Against Number Of Differentially Expressed Genes Result In A Sigmoid Curve?
1
0
Entering edit mode
11.1 years ago
pollyD ▴ 30

Hi all,

I'm working with RNA microarrays, and I'm a newbie to the field. I processed the results with the limma package for Bioconductor.

I'm trying to decide which fold change threshold I should use. I plotted logFC threshold against the number of differentially expressed genes. I have three conditions, so I got three curves **plot** here which look very much like reverse sigmoid curves. I've asked colleagues and was told that good quality data usually show such curves but it doesn't imply anything.

So, could use please explain why do they look this way or where could I find any explanation? Then, is it possible to use this data to define log fold change threshold? Do plateau, transition and exponential phase have any biological meaning?

Thanks in advance!

microarray rna bioconductor • 4.9k views
ADD COMMENT
2
Entering edit mode

The curves look as expected IMHO, because you are ignoring the sign of the log-fold change. Most genes marginally change expression thats why you see a plateau between 0 and 1. If you plot sorted log-fold changes with sign, you'll see a curve like this http://en.wikipedia.org/wiki/File:Logistic-curve.svg.Or, a gaussian curve if you plot the histogram.You can choose typical joint fold change and p-value cut off, say abs(FC)= 1.5X and p-value 0.05

ADD REPLY
0
Entering edit mode

Thanks for the idea, but it doesn't explain what I have. If I do as you suggest, I will get another representation of the volcano plot, won't I?

I might have misled you with the axis label (sorry!). It'a logFC threshold. So, the output is the number of genes which are regarded as differentially expressed at this threshold, a kind of cumulative variable. If I plot upregulated and downregulated genes separately, I get similar curves.

ADD REPLY
0
Entering edit mode

For me its difficult to tell WHY the #of differential proteins vs. a cut-off FC gives such a curve, but my guess is 'central part' of the log-FC histogram comes from a Gaussian model and the ' tails' , which contain differential proteins follow some kind of heavy tailed distribution. People tried to model gene/protein expression using mixture models viz. Gaussian combined with Generalized Pareto. http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007454

ADD REPLY
0
Entering edit mode

Well, I definitely have to learn more about the methods I'm using.

Thanks for the link!

ADD REPLY
0
Entering edit mode
11.1 years ago
Woa ★ 2.9k

BTW, how you are defining "differentially expressed" at a given threshold? A gaussian data would produce similar curve, however the shoulder is less broad

rm(list=ls())
my.data <- rnorm(5000,0,0.3)*4.0
summary (my.data)
hist(my.data)
thres <-seq(0.5,4,0.5)
my.sum <-rep(NA,length(thres))

for ( i in 1:length(thres) ){
    my.sum[i] <- length(my.data[abs(my.data) > thres[i]])

}

plot(thres,my.sum,pch=20,cex=2.0,col="hotpink")
lines(thres,my.sum,,col="blue",lty=4,lwd=1)
ADD COMMENT
0
Entering edit mode

I define differentially expressed genes as those having absolute log FC > 0.5 and p < 0.001 (in this case). p value comes from the the linear model and were subjected to Benjamini-Hochberg correction.

ADD REPLY

Login before adding your answer.

Traffic: 2301 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6