RNA-Seq data distribution
1
1
Entering edit mode
4.8 years ago

In the papers I read, it is usually claimed that since RNA-Seq is count data, a Poisson or negative binomial distribution would be the most suitable ones to model the RNA-Seq data. However, as a computational biologist, none of the RNA-Seq data I have seen so far is composed of integers. All RNA-Seq datasets I have seen contain decimals, which is probably because there is a standard normalization process applied to the raw read counts, which is crucial. This normalization process usually adjusts for sequencing depth and also for overdispersion. So, my question is, how come we can model those decimal numbers with Poisson or negative binomial? As I said, I have never seen processed (or normalized) RNA-Seq data that contain integers. What am I missing?

RNA-Seq distribution • 2.5k views
ADD COMMENT
1
Entering edit mode
4.8 years ago
GZ1995 ▴ 410

Most softwares (DESeq2, edgeR) model the raw counts rather than normalized counts by dividing out the size factors.That's the assumption for NB model and you will break the mean-variance relationship of NB if you take normalized counts as input. If you have FPKM which is in decimals, you cannot use them directly for any discrete model. My suggestion is try to get the raw counts for NB model, or switch to limma with eBayes(trend=T) with FPKM.

ADD COMMENT

Login before adding your answer.

Traffic: 2005 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6