Question: How To Calculate Fpkm (Fragments Per Kilobase Of Exon Per Million Fragments Mapped)
gravatar for User 5402
9.7 years ago by
User 5402210
User 5402210 wrote:


Perhaps this has already been answered somewhere, but I am not seeing a satisfactory explanation. I want to understand how one calculates FPKM (fragments per kilobase of exon per million fragments mapped) in RNA-seq data. Everywhere I look, I see people saying that it is the number of reads aligned per kilobases of the transcript per million mappable read from the total dataset, and that the difference between RPKM and FPKM is that one fragment is a pair of reads for paired end data. If I have any aspect of that wrong, please inform me.

If the above is right, then how is it that Cufflinks is able to find transcripts that are as low as 10^-12 FPKM? How is that possible?

So I have tried to do a back of the envelope calculation on a gene that has a very low FPKM as reported by Cufflinks. This gene's total combined exons are ~3 kb. It has ~2000 reads aligned by Tophat and the dataset has ~24 million reads in total. If I am understanding how to calculate it, it seems like the gene's FPKM should be 28 or at least somewhere near that order of magnitude. Instead the Cufflinks output says that is has a FPKM of 2.9531e-12. What am I missing here/doing wrong? How can any transcript have such a low FPKM/RPKM? If the dataset size is in the range of 10-100 million reads, then to get a number like 10^-12, with even just 1 read/fragment you would need a transcript that is larger than the size of the human genome?

So I know I must not be understanding this right. Thank you in advance for your help!

rpkm fpkm rna • 101k views
ADD COMMENTlink modified 4.2 years ago by jokipokemon00010 • written 9.7 years ago by User 5402210

you are totally right, there is no way of getting near that number using the RPKM formula ( 2000/(3000*2.4e7) ~ 28). Is the FPKM formula maybe different? Is is documented how cufflinks calculates this?

ADD REPLYlink written 9.7 years ago by Michael Dondrup48k

Paired-end based "fragments per kilobase of exon per million fragments mapped" (FPKM) is analagous to single-end based "reads aligned per kilobases mapped" (RPKM) and is "simply a nomenclature change to better reflect what RNA-Seq actually measures".

ADD REPLYlink written 9.7 years ago by Casey Bergman18k

Cuffflinks uses a statistical model to calculate FPKM.. It's given in the supplementary methods of the cufflinks paper. Even while running cufflinks you have to input the mean and variance of the read length distribution (for single reads). The results vary with different parameters.

ADD REPLYlink written 7.4 years ago by Bharat Iyengar300
gravatar for Mikael Huss
9.7 years ago by
Mikael Huss4.7k
Mikael Huss4.7k wrote:

Hmm. Are you looking at the gene level or the transcript level? If you are looking at transcript FPKMs and the gene in question has alternative transcripts, one of the isoforms could get a zero estimate while another (or several others) would get the reads assigned to it/them.

I have never seen Cufflinks FPKMs as low as 1e-12. (Except for 0, of course!) The smallest values I get tend to be around 0.0001 (1e-4). Could it be a numerical issue, where the estimate is really zero, but the program reports a very small value instead? (Although in those cases, I think the values tend to be even smaller, like ~1e-16 depending on the machine)

ADD COMMENTlink written 9.7 years ago by Mikael Huss4.7k

This specific gene only has one transcript according to Cufflinks.

Do you think there could there be a problem with the way I am running Cufflinks? I have just been using the defaults and running the Tophat output against the human .gtf file. I am seeing ~30 transcripts that do not fail in the Cufflinks status but have FPKM values from 1e-5 through 1e-12.

Most importantly though, how is Cufflinks calculating 2.9531e-12 (or 0 if that is a numerical issue) FPKM internally, since that still makes no sense.

ADD REPLYlink written 9.7 years ago by User 5402210

I'm stumped - I can't think of any way to run the program so that you would end up with values like that. I think you'll have to email and ask the developers of Cufflinks.

ADD REPLYlink written 9.7 years ago by Mikael Huss4.7k

Did u find the reason for Cufflinks low FPKM values ?

ADD REPLYlink written 6.8 years ago by geek_y11k

Probably just a numerical issue. I would consider it zero.

ADD REPLYlink written 6.8 years ago by Mikael Huss4.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1527 users visited in the last hour