Question: RNA-Seq data distribution
0
gravatar for ebrudermanver
3.6 years ago by
ebrudermanver60 wrote:

In the papers I read, it is usually claimed that since RNA-Seq is count data, a Poisson or negative binomial distribution would be the most suitable ones to model the RNA-Seq data. However, as a computational biologist, none of the RNA-Seq data I have seen so far is composed of integers. All RNA-Seq datasets I have seen contain decimals, which is probably because there is a standard normalization process applied to the raw read counts, which is crucial. This normalization process usually adjusts for sequencing depth and also for overdispersion. So, my question is, how come we can model those decimal numbers with Poisson or negative binomial? As I said, I have never seen processed (or normalized) RNA-Seq data that contain integers. What am I missing?

distribution rna-seq • 2.0k views
ADD COMMENTlink modified 4 months ago by Biostar ♦♦ 20 • written 3.6 years ago by ebrudermanver60
1
gravatar for GZ1995
3.6 years ago by
GZ1995360
GZ1995360 wrote:

Most softwares (DESeq2, edgeR) model the raw counts rather than normalized counts by dividing out the size factors.That's the assumption for NB model and you will break the mean-variance relationship of NB if you take normalized counts as input. If you have FPKM which is in decimals, you cannot use them directly for any discrete model. My suggestion is try to get the raw counts for NB model, or switch to limma with eBayes(trend=T) with FPKM.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by GZ1995360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1297 users visited in the last hour