Question

Error in the generative model of RNA-Seq?

0

Entering edit mode

7.3 years ago

roma ▴ 120

Happy holidays!

I am currently studying the paper RNA-Seq gene expression estimation with read mapping uncertainty.

If I understand correctly, in the generative model the probability of picking a read from a given transcript is equal to the abundance of that transcript: p(G_n=i|θ) = θ_i. Thus, if there is a 1kb transcript and a 10kb transcript expressed at the same level (TPM), their model would predict close to equal number of reads for the two transcripts.

However, due to fragmentation, in "real" RNA-Seq the 10kb transcript would result in 10x more fragments and thus 10x more reads.

Am I missing something here, or is the model in that paper wrong?

RNA-Seq • 1.3k views

ADD COMMENT • link updated 7.3 years ago by Rob 6.5k • written 7.3 years ago by roma ▴ 120

score 3 · Accepted Answer · 2016-12-30

3

Entering edit mode

7.3 years ago

Rob 6.5k

The paper is correct. I think what you're missing is that the Theta track/estimate the nucleotide fractions, not the transcript fractions. So these parameters are not normalized by length, but this is done later to compute tau and, subsequently, TPM.