Question

Cufflinks And Cuffdiff Fpkm Values In Galaxy

2

Entering edit mode

11.6 years ago

KS ▴ 380

Hello everyone,

I am comparing two samples (control and treated) paired end RNA Seq data on galaxy. I am getting different FPKM values on my cufflinks output of 2 samples when compared to the 2 values that cuffdiff..

Below is the snap shot of a gene with different FPKM values. (I am not displaying the gene name).

Cufflinks output:

Gene     length     coverage     FPKM     FPKM_conf_lo     FPKM_conf_hi     FPKM_status     Sample
xyz     -     -     39.9094     38.9553     40.8635     OK     Control
xyz     -     -     19.4664     18.7786     20.1542     OK     Treated

cuffdiff output:

Gene      sample_1     sample_2     status     value_1/Control     value_2/ treated     log2(fold_change)     test_stat     p_value     q_value     significant
xyz     q1     q2     OK     59.6706     27.1551     -1.1358     5.06151     4.16E-07     4.00E-05     yes

I am assuming that FPKM values from cufflinks and cuffdiff output should be matching. Any kind of suggestions are appreciated.

Thanks

galaxy cufflinks cuffdiff • 10k views

ADD COMMENT • link updated 10.3 years ago by Ashley Stephen Doane ▴ 20 • written 11.6 years ago by KS ▴ 380

score 4 · Answer 1 · 2013-03-11

Hi, as you can read il the online documentation at http://cufflinks.cbcb.umd.edu/howitworks.html#reps the cuffdiff tool compute the FPKM in a slightly differnt way with respect to cufflink, it use a dispersion model deriver by all the samples you are analyzing. This sentence may help:

"Cuffdiff takes an approach to differential expression analysis that is radically different from most other RNA-seq analysis packages. Because Cufflinks calculates individual transcript abundances, it is very sensitive when looking for differentially expressed genes, especially when those genes are alternatively spliced. However, in order to deal with the overdispersion that is known to exist among biological replicates, Cuffdiff fits a model for fragment count variances in each condition prior to doing any testing. Cuffdiff uses the LOCFIT regression package, written by Catherine Loader and Jiayang Sun, for this purpose. Cuffdiff models fragment count overdispersion the same way Anders and Huber do in their DEseq package to derive a count dispersion model for each experimental condition. If only one replicate is available in each condtion, Cuffdiff pools the conditions together to derive a dispersion model. The dispersion model, which describes variances of fragment counts across replicates, is then used to calculate the variances on a gene's relative expression level across replicates. It is these expression level variances that are used during testing for differences at the gene and transcript level."

Regards

score 0 · Answer 2 · 2013-10-06

Hi! I am currently having a similar problem to yours. When I run the cuffdiff package (with reference) I get FPKM values that differ greatly from the FPKM values per replicate (n=3) obtained when doing the same with cufflinks. Let me show you an example:

This would be the FPKM values obtained from the CuffDiff output:

        A    B    C    D    E    F
gene    107.894    62.4416    16.6914    2.18289    0.196219    0.977153
gene.1    59.4121    34.5872    8.18243    2.01778    0.210608    1.06329
gene.2    41.2712    22.9315    7.98328    0.256128    3.48E-06    0

And these are the FPKM values from the cufflinks' output:

    A1    A2    A3    B1    B2    B3    C1    C2    C3    D1    D2    D3    E1    E2    E3    F1    F2    F3
gene    95.6597    4.32227    62.5467    3.08228    2.99791    162.195    2.68714    3.41266    2.78355    3.56007    203.075    2.20681    3.39579    3.08628    99.7982    3.43312    1.28864    3.47551
gene.1    53.2729    0.383551    8.15E-07    202.532    149.746    1.06E-06    220.887    1.48E-06    0.82031    301.761    203.075    1.16E-06    277.827    273.576    32.3428    284.11    254.081    33.9261
gene.2    42.3868    1.02048    3.79E-07    2.5629    2.99791    162.195    38.224    2.97E-07    0.403575    1.62E-06    1.8266    4.91E-07    31.9512    9.99E-07    275.299    0.524192    33.0755    263.226

As you can see, it seems unlikely that the FPKM values from the gene expression data (gene) from cufflinks (A1, A2 and A3) can turn into the cuffdiff FPKM value (A). The same goes for each transcript isoform (gene.1 and gene.2).

Can anyone point me in the right direction to go from the cufflinks FPKMs to the cuffdiff FPKMs?

Thank you very much in advance!

Best regards, Andrew

score 0 · Answer 3 · 2013-12-30

I'm also having issue with results from cuffdiff in galaxy, in which I'm getting a lot of FPKM values of 0 (zero). For all significant genes at p<0.001 and two conditions, one of the conditions has a value of zero in every case.

What is most troubling, however, is that running the exact same .bam and gtf files with the exact same options on the gene pattern server does not produce such a result. That is, genes with highest significance (lowest p) do not have fpkm values of zero in either sample 1 or sample 2, as galaxy reports.

If anyone has thoughts, much appreciated.