Question: Differences between Cufflinks results and home-made FPKM values
gravatar for nicolas.hipp
2.8 years ago by
Rennes, France
nicolas.hipp0 wrote:

Hi everyone,

I have a problem to understand how Cufflinks calculate FPKM :

There are a lots of posts on this issue, I read that FPKM= Number of mapped fragments / (length of the gene /1000) / (size of the libraries / 10^6)

I have output from featureCount which give me raw counts and the length of the gene. When I do the calcul with these values I cannot find the output for the same .bam file from Cufflink eg:

FPKM    raw.count   Gene Length Manual FPKM
BACH2   31,6243 26315   9786    30,04520196
GAPDH   3490,14 358474  2981    1343,608215
Size of the librarie    89500000    89500000

Interestingly BACH2 have similar values for FPKM, but I don't understand why the two values for GAPDH are so different. Does-anybody know how It can be possible, the only think that I see is that length of the gene GAPDH is wrong... and I don't find anything about the calculation for this Length by featureCounts ( is-it the ORF size? how It deals with isoforms?)

If anybody have an idea :)

Thanks a lot, nicolas

rna-seq • 1.2k views
ADD COMMENTlink modified 2.8 years ago by Devon Ryan97k • written 2.8 years ago by nicolas.hipp0
gravatar for Devon Ryan
2.8 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

The values you use and those used by cufflinks will be different for "number of mapped fragments" and "length of gene". Cufflinks is trying to handle multimappers and will also be using more of an "effective length" for each gene, which will likely vary by sample.

If you look at a single gene in your GTF file then I expect you'll quickly realize what featureCounts is doing to get a gene length.

ADD COMMENTlink written 2.8 years ago by Devon Ryan97k

Oh yes, thanks for this explanation, I forgot the multi mapperparameters... sorry for the naive question ..

I still try to understand the calcul of the length, when I extract information from BACH2 gene for eg, I found that the gtf file contains the length of each exon, utr, gene length and "transcript values". By sum the differences between starting and ending position I guess that length for gene is a sum of length for all exons transcribed? If this is that, I don't find the same value as Cufflink do... (800 bp between the both).

Thanks a lot nicolas

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by nicolas.hipp0

Cufflinks is taking the weighted average of expressed transcript lengths (weighted by their relative expression), or something close to that.

ADD REPLYlink written 2.8 years ago by Devon Ryan97k

Ok thanks for the help ;)

ADD REPLYlink written 2.8 years ago by nicolas.hipp0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1299 users visited in the last hour