Question

cuffdiff and changes in FPKM values due to isoforms

0

Entering edit mode

8.9 years ago

jo_grodem • 0

Hi, I hope that someone can help with this or direct me to the right thread - I need help on how to accurately report out RNAseq data.

I just started in a lab where they have used a service to perform the two rounds of RNAseq and the downstream bioinformatics. We were delivered tables with FPKM values for each gene in each treatment, and tables with differential expression analysis (of genes, not isoforms). The question here arises from the cuffdiff output.

The set up is (round 1) treatment 1 vs 2 and (round 2) treatment 3 v 4, 3 v 5, 4 vs 5. Each performed in triplicate

The FPKM values differ for about 2 % of the genes between the two lists, and the company explained this as an effect of isoforms, but that the FPKM values in both tables are correct.

e.g. For one gene of interest:

(A) FPKM values taken from FPKM tables: T1 (36,3145) T2 (38,0397) T3 (36,489) T4 (34,001) T5 (38,242)

(B) FPKM values taken from pairwsie comparision datasets:

T 1 vs 2: T 1 (0,867) T 2 (1,693)

T 3 v 4, 3 v 5, 4 vs 5: T3 (36,489) T4 (34,001) T5 (38,242)

As you can see, for the first 2 treatments, the values change. For the final 3, the values do not change in the diff exp analysis. We are concerned as treatment 1 and 3 are the same, just repeated in different rounds of RNAseq.

I want to report the expression of this gene in each treatment and between treatment. If I take the values from (A) OR (B) the result is different.

When I asked the company how to correctly handle this, the company said that "it seems that the values of fpkm tables fit better"... Can I pick and choose?! Does anyone know the best way to accurately report this?

Thank you in advance

RNA-Seq • 2.1k views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 8.9 years ago by jo_grodem • 0

Ram · Answer 1 · 2015-05-13

I cannot directly comment on the cause for the differences but would advise caution with respect to using FPKM to report gene expression values. It is not state-of-the-art.

We have had several reports of problems with using FPKM on BioStars, from users and multiple publications. It seems that you are not interested in isoform expression, then use of FPKM is not justified imo.

I would recommend re-analyzing for gene-wise DE analysis using raw counts from bam files or raw data under a negative binomial model (edgeR or DEseq) and report changes as normalized fold change as well as raw counts or CPM/TPM.

I should add that it might be an advantage to get most or all of the computation under your control, as it is problematic to publish data that you cannot fully understand nor guarantee their correctness.