Is Rnaseq Overdispersion Due To Sample Prep Bottlenecks?
1
12
Entering edit mode
12.2 years ago
Stan Letovsky ▴ 140

-is Poisson a better approximation for high-expressors than for low-expressors (i.e., are low-expressors more overdispersed than high expressors)?

-is it possible that overdispersion is a result of bottlenecks in the sample prep process that result in small numbers of low expressors at points upstream from the final read-sampling? Not clear what the effective molecule population sizes are for low expressors during poly-A selection, etc. If rare mRNAs were being sampled from small populations at one or more upstream bottlenecks, wouldn't we expect a convolution of Poissons due to successive samplings, not a sum (negative binomial)? We would also predict more overdispersion for low expressors, and that the overdispersion would be observable between libraries prepared from the same sample, but not within repeated sequencings of a single library, which should look Poisson because only read-sampling is applied. Does anyone know if this conjecture is consistent with observations?

rna • 5.2k views
ADD COMMENT
1
Entering edit mode

"...the overdispersion would be observable between libraries prepared from the same sample, but not within repeated sequencings of a single library, which should look Poisson because only read-sampling is applied." Yes, this is the consensus view from what I know. Don't know about low vs high expressors.

ADD REPLY
4
Entering edit mode
12.2 years ago

For the NGS data I have looked at (exome/genome) sequence data I also see over dispersion. Poisson is suppose to fit, but after spending some time modeling these data I can clearly see it would only fit when the depth of sequencing is very low. The negative binomial is much more justifiable.

I don't think there is that much biological meaning in these trends and I would caution against drawing conclusions from them.

ADD COMMENT
1
Entering edit mode

Do you see over-dispersion when you compare two sets of reads from the same sample with different library prep, when you compare biological replicates, or when you compare two sets of reads from the same library?

ADD REPLY
1
Entering edit mode

Zev & qdjm, Thanks for your comments. I looked at my own RNASeq data to determine whether overdispersion was less for low expressors, using var/mean as a measure of overdispersion (should be 1 for Poisson), and Zev is right, it does go down, though values as high as 10 are still common at the low end (and 50 at the high end). Oddly, these are technical replicates, so the noise is not biological, it must be introduced after RNA purification.

ADD REPLY
0
Entering edit mode

I haven't had biological replicates. This observation is from depth files generated from samtools.

ADD REPLY

Login before adding your answer.

Traffic: 2272 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6