2
2
Entering edit mode
9.3 years ago
Bio_ysl ▴ 20

Hello,

I'm having some problems to understand the Cufflinks FPKMs... I use Cufflinks to obtain the FPKMs froma a ".sam" file obtained from the denovo assembly permormed wit MIRA3 of a transcriptome. Cufflinks give me us an output the files "Isoforms.fpkm_trasckin" and "genes.fpkm_tracking", in both files the FPKMs values are the same. However, I tried to apply the formula FPKM=10^9*numreads_of_the_fragment/(total_assembled_reads * length_of_the_fragment)) and the values are ttally different, with non apparent correlation...

I give you an example (Num. reads assembled: 61725)

contig       num.reads      length     FPKM(formula applied)        FPKM(cufflinks)
c1                   7           487    232.8670171315                     402.01
c2                   6           446    217.9492069373                     399.86
c3                   5           486    166.6758338375                     287.15
c4                   6              489    198.7839392516                     342154.00
c5                   7           712    159.2784232346                     223638.00


Can somebody explain me these differences? What formula Cufflinks use? Thanks!

2
Entering edit mode

I think that you meant 10^6 instead of 10^9. Also, check the normalization method you used with cuffdiff. I f you used the default you actually applied a further normalization (which is better than FPKM). You can read the manual page: http://cufflinks.cbcb.umd.edu/manual.html#library_norm_meth

0
Entering edit mode

I think 10^9 is okay, isn't it 1000*1000000?

By the way, there should be SOME correlation with the cufflinks RPKM and the one you calculate yourself... If not, something's wrong.

0
Entering edit mode

Yes, 10^9 should be correct... If there is any correlation I couldn't see it, and this is what creates me so many doubts...

1
Entering edit mode

Ooops. Yes, that's correct...

3
Entering edit mode
9.3 years ago

Did you read the cufflinks paper? It's a lot more complicated than that... Not sure it's really possible to put it into a text box, even if I did claim to understand it. Supplementary methods section 3, lots of math.

3
Entering edit mode
9.3 years ago

Two things that may be responsible for the discrepancy:

1. Handling of multimapped reads. I believe that Cufflinks evenly divides a read between all places that it maps. For example, a read that is mapped to 4 locations is counted as .25 reads at each individual locus.
2. If any of your transcripts overlap (i.e. one gene has multiple transcripts), then Cufflinks does some sort of deconvolution to best determine which transcript a read belongs to.

0
Entering edit mode

Thank you for the answer! But in this case, we are talking about a denovo assemble done with MIRA3, I'm not sure if a read can be assembled in more than one contig....