Question: Rpkm Calculation For Genes
gravatar for siddharth.sethi5
7.4 years ago by
Medical Research Council, Oxford, United Kingdom
siddharth.sethi5250 wrote:

Hi everyone

I understand the RPKM fomula is as follows:

C = Number of reads mapped to a gene

N = Total mapped reads in the experiment

L = exon length in base-pairs for a gene

Equation = RPKM = (10^9 * C)/(N * L)

I have the counts ( from HTSeq) and transcript length (retrieved from Ensembl API) for each gene.

My question is, for the total mapped reads(N), should I be counting the reads only in the exons for all genes? If thats right, can I just add all the gene counts from HTSeq output to get the total mapped reads.

the total mapped reads will be ALL the mapped reads in the BAM file?

rpkm rna-seq • 56k views
ADD COMMENTlink written 7.4 years ago by siddharth.sethi5250

Can somebody confirm that this is the same equation as Why is 10^9 different that 1e6? CORRECTION: nevermind, I didn't see the 1e3 incorporated in one of the other eqns

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by O.rka170
gravatar for JC
7.4 years ago by
JC9.5k wrote:

Well, maybe you don't need RPKM any more: "Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples". Wagner GP, Kin K, Lynch VJ. _Theory Biosci._ 2012. PubMed

ADD COMMENTlink written 7.4 years ago by JC9.5k

That's an interesting paper. Here is a link to the full text:

ADD REPLYlink written 7.4 years ago by Damian Kao15k

Nice blog post on this Damian: RPKM measure is inconsistent among samples. Of course this begs the question, just how many of the papers using RPKM are not quite correct?

ADD REPLYlink modified 7.3 years ago • written 7.3 years ago by Istvan Albert ♦♦ 82k
gravatar for Ali
7.4 years ago by
Iran, Islamic Republic Of
Ali140 wrote:

First of all, it does not matter if you are going to perform differential gene expression analysis or something like that.

Note that N is common in all genes, so either of the options will only scale your numbers.

If you are focusing only on the expression level of the genes, perhaps a good way is to set N as the number of reads mapped to the exons, to have the sum of C for all genes equal to N.

If you are interested in miRNAs, ncRNAs and so on, you may keep N as all of the mapped reads.

ADD COMMENTlink written 7.4 years ago by Ali140
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1172 users visited in the last hour