Question: RNA-Seq data distribution
gravatar for ebrudermanver
4.2 years ago by
ebrudermanver80 wrote:

In the papers I read, it is usually claimed that since RNA-Seq is count data, a Poisson or negative binomial distribution would be the most suitable ones to model the RNA-Seq data. However, as a computational biologist, none of the RNA-Seq data I have seen so far is composed of integers. All RNA-Seq datasets I have seen contain decimals, which is probably because there is a standard normalization process applied to the raw read counts, which is crucial. This normalization process usually adjusts for sequencing depth and also for overdispersion. So, my question is, how come we can model those decimal numbers with Poisson or negative binomial? As I said, I have never seen processed (or normalized) RNA-Seq data that contain integers. What am I missing?

distribution rna-seq • 2.2k views
ADD COMMENTlink modified 11 months ago by Biostar ♦♦ 20 • written 4.2 years ago by ebrudermanver80
gravatar for GZ1995
4.2 years ago by
GZ1995390 wrote:

Most softwares (DESeq2, edgeR) model the raw counts rather than normalized counts by dividing out the size factors.That's the assumption for NB model and you will break the mean-variance relationship of NB if you take normalized counts as input. If you have FPKM which is in decimals, you cannot use them directly for any discrete model. My suggestion is try to get the raw counts for NB model, or switch to limma with eBayes(trend=T) with FPKM.

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by GZ1995390
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2108 users visited in the last hour