Question: Problem With Fpkm In Cufflinks
2
5.8 years ago by
Bio_ysl20
Spain
Bio_ysl20 wrote:

Hello,

I'm having some problems to understand the Cufflinks FPKMs... I use Cufflinks to obtain the FPKMs froma a ".sam" file obtained from the denovo assembly permormed wit MIRA3 of a transcriptome. Cufflinks give me us an output the files "Isoforms.fpkm_trasckin" and "genes.fpkm_tracking", in both files the FPKMs values are the same. However, I tried to apply the formula `FPKM=10^9*numreads_of_the_fragment/(total_assembled_reads * length_of_the_fragment))` and the values are ttally different, with non apparent correlation...

I give you an example (Num. reads assembled: 61725)

``````contig       num.reads      length     FPKM(formula applied)        FPKM(cufflinks)
c1                   7           487    232.8670171315                     402.01
c2                   6           446    217.9492069373                     399.86
c3                   5           486    166.6758338375                     287.15
c4                   6              489    198.7839392516                     342154.00
c5                   7           712    159.2784232346                     223638.00
``````

Can somebody explain me these differences? What formula Cufflinks use? Thanks!

modified 5.8 years ago by Chris Cabanski330 • written 5.8 years ago by Bio_ysl20
2

I think that you meant 10^6 instead of 10^9. Also, check the normalization method you used with cuffdiff. I f you used the default you actually applied a further normalization (which is better than FPKM). You can read the manual page: http://cufflinks.cbcb.umd.edu/manual.html#library_norm_meth

I think 10^9 is okay, isn't it 1000*1000000?

By the way, there should be SOME correlation with the cufflinks RPKM and the one you calculate yourself... If not, something's wrong.

Yes, 10^9 should be correct... If there is any correlation I couldn't see it, and this is what creates me so many doubts...

1

Ooops. Yes, that's correct...

3
5.8 years ago by
Kansas City

Did you read the cufflinks paper? It's a lot more complicated than that... Not sure it's really possible to put it into a text box, even if I did claim to understand it. Supplementary methods section 3, lots of math.

3
5.8 years ago by
Chris Cabanski330 wrote:

Two things that may be responsible for the discrepancy:

1. Handling of multimapped reads. I believe that Cufflinks evenly divides a read between all places that it maps. For example, a read that is mapped to 4 locations is counted as .25 reads at each individual locus.
2. If any of your transcripts overlap (i.e. one gene has multiple transcripts), then Cufflinks does some sort of deconvolution to best determine which transcript a read belongs to.