Statistical Distributions In Rna-Seq Data Analysis
Entering edit mode
11.9 years ago
Ngsnewbie ▴ 380

I came to know that RNA-seq data follows bionomial/ negative bionomial distribution. Well i am not a statistician, but i studied about basics of statistics, statistical terms, probability, distributions and statistical tests on internet.The text available on internet use coin flipping, playing cards, throwing dice type of examples which helped me to understand the statistics (well i say basic statistics) behind it .but when i come to RNA-seq data i am not able to correlate and comprehend.

Can anyone explain (or provide me a link) RNA-seq data distribution (eg. bionomial / negative bionomial) and statistical (eg. T test) test taking an example of RNA-seq count/FPKM data, where we have input parameters:

1.Number of genes in organisms

2.Number of reads mapped on these genes

Thanks in Advance :)

statistics rna • 7.1k views
Entering edit mode

I don't think you will find a derivation for why the negative binomial is used for RNA-Seq in the same way for example the binomial distribution would be used to model card games or Poisson would be good to model the number of customers per hour. In real life the number of reads counted for any gene tends to vary between individuals more than the Poisson distribution (what is usually used for count data) would model. The negative binomial is used because it is more accurately matches what is observed than Poisson. As frustrating as this sounds it is still better than microarrays.

Entering edit mode

read DESeq and edgeR paper. It's well explained in it

Entering edit mode

Look at the 5th response (by Simon Anders) in this Seqanswers forum post.


Login before adding your answer.

Traffic: 1161 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6