Question: Negative Binomial and Poisson distribution of RNA-Seq
gravatar for ashwini
5.7 years ago by
ashwini100 wrote:

Dear All,

I am a Biologist trying to understand the statistics of RNA-Seq data.

Given that RNA-Seq follows NB distribution with Biological replicates, as NBD accounts for overdispersion in the data, I am not sure how to ascertain it to my data.

Although I understanstood these distributions through standard books I am unable to comprehend and relate it to RNA-Seq.

Differential expression is my aim.

I have simulated data, to test and understand some open source tools like edgeR, DESeq, Cufflinks etc.

I have real data set too.

I have two conditions with four replicates each.

If I have to know whether my data fits NBD or Poisson distribution, I have to check this across replicates of each gene of each condition??

If the above point is right, how do I do it?

Should I do some goodness of fit test like Chi-sqare test or just the mean variance relationship is enough?

Thanks in advance for your valuable inputs.

ADD COMMENTlink modified 5.7 years ago by Damian Kao15k • written 5.7 years ago by ashwini100
gravatar for Damian Kao
5.7 years ago by
Damian Kao15k
Damian Kao15k wrote:

Poisson distribution accounts for technical variance. NB distribution accounts for both technical and biological variance. 

NB distribution is also a Poisson-gamma mixture distribution. Imagine you have a single biological sample (RNA extract) that you take aliquots out of to make technical replicates. These technical replicates will be Poisson distributed.

Now imagine you have multiple biological samples. You take multiple technical replicates out each biological replicate. You essentially now have multiple Poisson distributions for each biological replicate. The multiple Poisson distributions for each biological replicate can be described by a gamma distribution. Thus NB distribution (Poisson-gamma mixture) is used for RNA-seq. 

You can also think of it as the lambda variable of the Poisson distribution is gamma distributed.

ADD COMMENTlink written 5.7 years ago by Damian Kao15k

Thanks a lot for the reply. It is helpful.

In my post, I also mentioned about Simulations. So, if I have simulated data, what is the way around to check how well it fits a particular distribution (Poisson or NB) ? Is the test of Mean vs Variance or Dispersion enough to be sure that the data fits Poisson distribution or not.

ADD REPLYlink written 5.7 years ago by ashwini100

It is important to remember that these distributions describe variance across replicates. I guess what you can do with your simulation is to produce thousands of simulated libraries. Generate them with the same library size so we don't have to normalize. We will treat each simulation as a biological replicate. Then look at the distribution of tag counts for a specific transcript across all your biological replicates. Then see if this distribution fits the NB or poisson better. 

For your real dataset, there probably isn't enough replicate libraries for you to fit NB or poisson to.

By the way, I attended a NGS conference last year at University of Nottingham. A group at University of Dundee presented their findings where they performed ~50 biological replicates of yeast(?) to see if current statistical theories hold up. If I remember correctly, they did see that NB fitted the data well. And they also said something like 6 biological replicates was optimal for good DE. And spike-ins also helped a lot for DE.

ADD REPLYlink written 5.7 years ago by Damian Kao15k

Do you remember anything in detail surrounding their usage of spike-ins? I'm guessing that they were using them for library-size normalization. The current thinking is generally that spike-ins aren't that useful for most non-single-cell experiments except where there's likely to be gross transcriptional amplification involved. So it'd be interesting if they showed a nice dataset that argued otherwise.

ADD REPLYlink written 5.7 years ago by Devon Ryan94k

I think they did mention single cell experiments, but unfortunately, I do not remember any details. I guess we'll just have to wait for their publication.

ADD REPLYlink written 5.7 years ago by Damian Kao15k
gravatar for Devon Ryan
5.7 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

If you have biological replicates, then they're pretty much guaranteed to fit a negative-binomial distribution better than a Poisson distribution (otherwise, there's no biological variance). If you wanted to check, graph variance vs mean. If the values don't cluster on the dispersion==mean line, then it's not Poisson.

ADD COMMENTlink written 5.7 years ago by Devon Ryan94k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1551 users visited in the last hour