Question: Differences between Cufflinks results and home-made FPKM values
16 months ago by
Rennes, France
nicolas.hipp0 wrote:

Hi everyone,

I have a problem to understand how Cufflinks calculate FPKM :

There are a lots of posts on this issue, I read that FPKM= Number of mapped fragments / (length of the gene /1000) / (size of the libraries / 10^6)

I have output from featureCount which give me raw counts and the length of the gene. When I do the calcul with these values I cannot find the output for the same .bam file from Cufflink eg:

FPKM    raw.count   Gene Length Manual FPKM
BACH2   31,6243 26315   9786    30,04520196
GAPDH   3490,14 358474  2981    1343,608215
Size of the librarie    89500000    89500000

Interestingly BACH2 have similar values for FPKM, but I don't understand why the two values for GAPDH are so different. Does-anybody know how It can be possible, the only think that I see is that length of the gene GAPDH is wrong... and I don't find anything about the calculation for this Length by featureCounts ( is-it the ORF size? how It deals with isoforms?)

If anybody have an idea :)

Thanks a lot, nicolas

rna-seq • 670 views
16 months ago by nicolas.hipp0
16 months ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

The values you use and those used by cufflinks will be different for "number of mapped fragments" and "length of gene". Cufflinks is trying to handle multimappers and will also be using more of an "effective length" for each gene, which will likely vary by sample.

If you look at a single gene in your GTF file then I expect you'll quickly realize what featureCounts is doing to get a gene length.

16 months ago by Devon Ryan90k

Oh yes, thanks for this explanation, I forgot the multi mapperparameters... sorry for the naive question ..

I still try to understand the calcul of the length, when I extract information from BACH2 gene for eg, I found that the gtf file contains the length of each exon, utr, gene length and "transcript values". By sum the differences between starting and ending position I guess that length for gene is a sum of length for all exons transcribed? If this is that, I don't find the same value as Cufflink do... (800 bp between the both).

Thanks a lot nicolas

16 months ago by nicolas.hipp0

Cufflinks is taking the weighted average of expressed transcript lengths (weighted by their relative expression), or something close to that.

16 months ago by Devon Ryan90k

Ok thanks for the help ;)

16 months ago by nicolas.hipp0
