Question

Is Rnaseq Overdispersion Due To Sample Prep Bottlenecks?

12

Entering edit mode

12.2 years ago

Stan Letovsky ▴ 140

-is Poisson a better approximation for high-expressors than for low-expressors (i.e., are low-expressors more overdispersed than high expressors)?

-is it possible that overdispersion is a result of bottlenecks in the sample prep process that result in small numbers of low expressors at points upstream from the final read-sampling? Not clear what the effective molecule population sizes are for low expressors during poly-A selection, etc. If rare mRNAs were being sampled from small populations at one or more upstream bottlenecks, wouldn't we expect a convolution of Poissons due to successive samplings, not a sum (negative binomial)? We would also predict more overdispersion for low expressors, and that the overdispersion would be observable between libraries prepared from the same sample, but not within repeated sequencings of a single library, which should look Poisson because only read-sampling is applied. Does anyone know if this conjecture is consistent with observations?

rna • 5.2k views

ADD COMMENT • link updated 8.4 years ago by Biostar 20 • written 12.2 years ago by Stan Letovsky ▴ 140

1

Entering edit mode

"...the overdispersion would be observable between libraries prepared from the same sample, but not within repeated sequencings of a single library, which should look Poisson because only read-sampling is applied." Yes, this is the consensus view from what I know. Don't know about low vs high expressors.

ADD REPLY • link 12.2 years ago by Mikael Huss 4.8k

score 4 · Answer 1 · 2012-02-12

4

Entering edit mode

12.2 years ago

Zev.Kronenberg 12k

For the NGS data I have looked at (exome/genome) sequence data I also see over dispersion. Poisson is suppose to fit, but after spending some time modeling these data I can clearly see it would only fit when the depth of sequencing is very low. The negative binomial is much more justifiable.

I don't think there is that much biological meaning in these trends and I would caution against drawing conclusions from them.

ADD COMMENT • link 12.2 years ago by Zev.Kronenberg 12k

1

Entering edit mode

Do you see over-dispersion when you compare two sets of reads from the same sample with different library prep, when you compare biological replicates, or when you compare two sets of reads from the same library?

ADD REPLY • link 12.2 years ago by Qdjm 1.9k

1

Entering edit mode

Zev & qdjm, Thanks for your comments. I looked at my own RNASeq data to determine whether overdispersion was less for low expressors, using var/mean as a measure of overdispersion (should be 1 for Poisson), and Zev is right, it does go down, though values as high as 10 are still common at the low end (and 50 at the high end). Oddly, these are technical replicates, so the noise is not biological, it must be introduced after RNA purification.

ADD REPLY • link 12.2 years ago by Stan Letovsky ▴ 140

0

Entering edit mode

I haven't had biological replicates. This observation is from depth files generated from samtools.

ADD REPLY • link 12.2 years ago by Zev.Kronenberg 12k