Question: Problem With Fpkm In Cufflinks
2
gravatar for Bio_ysl
5.8 years ago by
Bio_ysl20
Spain
Bio_ysl20 wrote:

Hello,

I'm having some problems to understand the Cufflinks FPKMs... I use Cufflinks to obtain the FPKMs froma a ".sam" file obtained from the denovo assembly permormed wit MIRA3 of a transcriptome. Cufflinks give me us an output the files "Isoforms.fpkm_trasckin" and "genes.fpkm_tracking", in both files the FPKMs values are the same. However, I tried to apply the formula FPKM=10^9*numreads_of_the_fragment/(total_assembled_reads * length_of_the_fragment)) and the values are ttally different, with non apparent correlation...

I give you an example (Num. reads assembled: 61725)

contig       num.reads      length     FPKM(formula applied)        FPKM(cufflinks)
c1                   7           487    232.8670171315                     402.01        
c2                   6           446    217.9492069373                     399.86        
c3                   5           486    166.6758338375                     287.15        
c4                   6              489    198.7839392516                     342154.00        
c5                   7           712    159.2784232346                     223638.00

Can somebody explain me these differences? What formula Cufflinks use? Thanks!

ADD COMMENTlink modified 5.8 years ago by Chris Cabanski330 • written 5.8 years ago by Bio_ysl20
2

I think that you meant 10^6 instead of 10^9. Also, check the normalization method you used with cuffdiff. I f you used the default you actually applied a further normalization (which is better than FPKM). You can read the manual page: http://cufflinks.cbcb.umd.edu/manual.html#library_norm_meth

ADD REPLYlink written 5.8 years ago by Fabio Marroni2.2k

I think 10^9 is okay, isn't it 1000*1000000?

By the way, there should be SOME correlation with the cufflinks RPKM and the one you calculate yourself... If not, something's wrong.

ADD REPLYlink written 5.8 years ago by Madelaine Gogol5.1k

Yes, 10^9 should be correct... If there is any correlation I couldn't see it, and this is what creates me so many doubts...

ADD REPLYlink written 5.8 years ago by Bio_ysl20
1

Ooops. Yes, that's correct...

ADD REPLYlink written 5.8 years ago by Fabio Marroni2.2k
3
gravatar for Madelaine Gogol
5.8 years ago by
Madelaine Gogol5.1k
Kansas City
Madelaine Gogol5.1k wrote:

Did you read the cufflinks paper? It's a lot more complicated than that... Not sure it's really possible to put it into a text box, even if I did claim to understand it. Supplementary methods section 3, lots of math.

ADD COMMENTlink written 5.8 years ago by Madelaine Gogol5.1k
3
gravatar for Chris Cabanski
5.8 years ago by
Chris Cabanski330 wrote:

Two things that may be responsible for the discrepancy:

  1. Handling of multimapped reads. I believe that Cufflinks evenly divides a read between all places that it maps. For example, a read that is mapped to 4 locations is counted as .25 reads at each individual locus.
  2. If any of your transcripts overlap (i.e. one gene has multiple transcripts), then Cufflinks does some sort of deconvolution to best determine which transcript a read belongs to.

Your read counting program probably addresses these 2 cases differently than Cufflinks.

ADD COMMENTlink written 5.8 years ago by Chris Cabanski330

Thank you for the answer! But in this case, we are talking about a denovo assemble done with MIRA3, I'm not sure if a read can be assembled in more than one contig....

ADD REPLYlink written 5.8 years ago by Bio_ysl20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1540 users visited in the last hour