Question: RNA-Seq data distribution
gravatar for ebrudermanver
2.6 years ago by
ebrudermanver50 wrote:

In the papers I read, it is usually claimed that since RNA-Seq is count data, a Poisson or negative binomial distribution would be the most suitable ones to model the RNA-Seq data. However, as a computational biologist, none of the RNA-Seq data I have seen so far is composed of integers. All RNA-Seq datasets I have seen contain decimals, which is probably because there is a standard normalization process applied to the raw read counts, which is crucial. This normalization process usually adjusts for sequencing depth and also for overdispersion. So, my question is, how come we can model those decimal numbers with Poisson or negative binomial? As I said, I have never seen processed (or normalized) RNA-Seq data that contain integers. What am I missing?

distribution rna-seq • 1.5k views
ADD COMMENTlink written 2.6 years ago by ebrudermanver50

Most softwares (DESeq2, edgeR) model the raw counts rather than normalized counts by dividing out the size factors.That's the assumption for NB model and you will break the mean-variance relationship of NB if you take normalized counts as input. If you have FPKM which is in decimals, you cannot use them directly for any discrete model. My suggestion is try to get the raw counts for NB model, or switch to limma with eBayes(trend=T) with FPKM.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by GZ1995350
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1900 users visited in the last hour