Question: Rpkm Calculation For Genes
9
gravatar for siddharth.sethi5
5.2 years ago by
Medical Research Council, Oxford, United Kingdom
siddharth.sethi5190 wrote:

Hi everyone

I understand the RPKM fomula is as follows:

C = Number of reads mapped to a gene

N = Total mapped reads in the experiment

L = exon length in base-pairs for a gene

Equation = RPKM = (10^9 * C)/(N * L)

I have the counts ( from HTSeq) and transcript length (retrieved from Ensembl API) for each gene.

My question is, for the total mapped reads(N), should I be counting the reads only in the exons for all genes? If thats right, can I just add all the gene counts from HTSeq output to get the total mapped reads.

OR
the total mapped reads will be ALL the mapped reads in the BAM file?

rpkm rna-seq • 50k views
ADD COMMENTlink written 5.2 years ago by siddharth.sethi5190

Can somebody confirm that this is the same equation as http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/? Why is 10^9 different that 1e6? CORRECTION: nevermind, I didn't see the 1e3 incorporated in one of the other eqns

ADD REPLYlink modified 9 months ago • written 9 months ago by O.rka30
10
gravatar for JC
5.2 years ago by
JC6.4k
Mexico
JC6.4k wrote:

Well, maybe you don't need RPKM any more: "Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples". Wagner GP, Kin K, Lynch VJ. _Theory Biosci._ 2012. PubMed

ADD COMMENTlink written 5.2 years ago by JC6.4k
1

That's an interesting paper. Here is a link to the full text: https://springerlink3.metapress.com/content/18284k158v887k57/fulltext.html

ADD REPLYlink written 5.1 years ago by Damian Kao14k
2

Nice blog post on this Damian: RPKM measure is inconsistent among samples. Of course this begs the question, just how many of the papers using RPKM are not quite correct?

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Istvan Albert ♦♦ 75k
3
gravatar for Ali
5.2 years ago by
Ali130
Iran, Islamic Republic Of
Ali130 wrote:

First of all, it does not matter if you are going to perform differential gene expression analysis or something like that.

Note that N is common in all genes, so either of the options will only scale your numbers.

If you are focusing only on the expression level of the genes, perhaps a good way is to set N as the number of reads mapped to the exons, to have the sum of C for all genes equal to N.

If you are interested in miRNAs, ncRNAs and so on, you may keep N as all of the mapped reads.

ADD COMMENTlink written 5.2 years ago by Ali130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1440 users visited in the last hour