Question: What kind of distribution to you expect to see for RNA-Seq expression levels?
gravatar for sviatoslav.kendall
4.9 years ago by
United States
sviatoslav.kendall480 wrote:

Maybe this question is too broad/vague but I want to ask it agnostically:

If I plot a histogram of RPKM-normalized gene expression for one gene across the a cohort (lets say a TCGA cohort), what shape do you expect to see? A normal-looking bell curve? Skewed to the left or right? Bimodal? 


rna-seq • 3.0k views
ADD COMMENTlink modified 4.9 years ago by Charles Warden6.6k • written 4.9 years ago by sviatoslav.kendall480

I was trying to figure it out myself and just saw this old thread. If still relevant for anybody, different tools assume the distribution either to be normal (limma) or negative binomial (EBSeq and DESeq2). You can find a little bit of explanation here:

ADD REPLYlink written 10 months ago by marina.v.yurieva480
gravatar for Devon Ryan
4.9 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

I honestly wouldn't expect to see any particular distribution. Many genes will have a normal distribution, due to not being differentially expressed in any of the samples. Others will have a skew, typically due to a floor effect. Others will have a bimodal or multimodal distribution, due to being up/down-regulated in some cancer types.

ADD COMMENTlink written 4.9 years ago by Devon Ryan90k

Hi Devon,

Could you elaborate on why you'd expect to see a normal distribution for many genes, given that RNA-seq count data is generally over-dispersed? I am currently analyzing a large RNA-seq dataset with hundreds of individuals, and have not seen an example of a gene with normally distributed RPKMs. Also -- isn't the skew you're describing due to the mean-variance relationship, i.e. greater variance at greater expression values?

Thanks, Allie

ADD REPLYlink written 19 months ago by allie0

Could you elaborate in more details regard the cases you mentioned?

I have been under the impression that not differentially expressed gene would follow negative binomial distribution due to biological / technical variation. In the cancer sample pool, over / under expressed genes would follow negative bibomial distribution as well maybe with even larger variance. In the cancer / normal mixed sample pool, differentially expressed gene may follow bimodal distribution. Do I understand these right?

ADD REPLYlink modified 7 months ago • written 7 months ago by CY330
gravatar for Bert Overduin
4.9 years ago by
Bert Overduin3.6k
Edinburgh Genomics, The University of Edinburgh
Bert Overduin3.6k wrote:

I would expect to see a normal distribution, although bimodal distributions have been observed: Bessarabova et al. Bimodal gene expression patterns in breast cancer. BMC Genomics 2010, 11(Suppl 1):S8.

ADD COMMENTlink written 4.9 years ago by Bert Overduin3.6k
gravatar for Charles Warden
4.9 years ago by
Charles Warden6.6k
Duarte, CA
Charles Warden6.6k wrote:

Overall, I would expect to see a mostly normal sample distribution if you worked with log2 (RPKM + 0.1) values, except for a peak at the rounding cutoff (which you could fix by removing the genes that almost never varied from that rounding cutoff across the samples, if you wanted).

For a gene-centric distribution, I agree with the other comments: it will vary between genes, and I wouldn't be surprised if it varied depending upon the context of the experiment (for example, depending upon the heterogeneity of the samples).

Maybe it is a bit of a tangent, but I've played around a bit with modeling bimodal gene expression, and I've described my experiences here:

That blog post was influenced by the work I did for this project:

ADD COMMENTlink written 4.9 years ago by Charles Warden6.6k

 log2 (RPKM + 0.1) or log2 (RPKM + 1.0) ?

ADD REPLYlink written 4.9 years ago by komal.rathi3.4k

I think that is a matter of personal preference.  I think 1.0 is a bit conservative, as it may be throwing out ~1/2 of your genes.  For example, see Figure 1 in this paper:

ADD REPLYlink written 4.9 years ago by Charles Warden6.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 736 users visited in the last hour