Hi all,
I want to know that 40 RPKM in TCGA-BRCA-RNAseqV2 correspond to how many transcript in per cell? How can I calculate it or how can I get the information about it?
Thanks for any help.
Hi all,
I want to know that 40 RPKM in TCGA-BRCA-RNAseqV2 correspond to how many transcript in per cell? How can I calculate it or how can I get the information about it?
Thanks for any help.
You can't quantify this to absolute numbers with just this data. RNA-seq doesn't sequence all mRNA molecules available, it essentially performs a sampling from that set. Higher abundant transcripts are more likely to be 'sampled' and that's just how RNA-seq works. You are sampling from a Poisson distribution (for one sample) or from a negative binomial distribution (overdispersion on the Poisson distribution = variability)
40 RPKM (Reads per kilobase length of transcript per million reads in the library) means that (besides transcript length normalization, ignored for simplification of argument) that read was found 40 times in a million reads. So it's only a relative quantification.
I suppose you can get to absolute quantifications by spiking in control RNAs and then estimate to which extent you sequenced the entire library (might call this sampling fraction, just making this term up) and extrapolate that to the genes of interest.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.