Question

Fpkm Values Differ Between Different Cuffdiff Runs Even When Same Data Is Used For One Of The Conditions

1

Entering edit mode

10.9 years ago

Nick ▴ 290

I have 2 conditions each having 4 replicates. I ran tophat-cuffdiff twice - once using 2 groups of 4 replicates and once excluding one of the replicates for one of the conditions (i.e. one condition had only 3 replicates, the other had the same set of 4 replicates as before).

I was somewhat surprised to find out that the FPKM values differed substantially between these two analytic runs for BOTH conditions even though the data for one of the conditions was the same in both runs.

The only explanation I can think of is that cuffdiff estimates the FPKM values by pooling the data from both conditions. Does anyone know whether this is true?

cuffdiff fpkm • 4.7k views

ADD COMMENT • link updated 10.8 years ago by Fabio Marroni ★ 3.0k • written 10.9 years ago by Nick ▴ 290

0

Entering edit mode

Was this based on de-novo assembly?

ADD REPLY • link 10.9 years ago by Michael 54k

0

Entering edit mode

No. I used a genome build as a reference along with the corresponding GTF file

ADD REPLY • link 10.9 years ago by Nick ▴ 290

0

Entering edit mode

I agree that this sounds undesirable. What happens if you run it multiple times on the exact same data?

ADD REPLY • link 10.9 years ago by Michael 54k

0

Entering edit mode

I think of trying it as a sort of sanity check.

ADD REPLY • link 10.9 years ago by Nick ▴ 290

0

Entering edit mode

I stumbled upon http://seqanswers.com/forums/showthread.php?t=4606 which seems to address a similar question. Some of the posts there seem to confirm that the FPKM values reported are not absolute but, rather, incorporate some sort of normalisation across conditions. I would still find it helpful if anyone can give a more definitive confirmation.

ADD REPLY • link 10.9 years ago by Nick ▴ 290

0

Entering edit mode

It is definitely like that, I don't have a reference but I have seen the same thing in my own runs.

ADD REPLY • link 10.9 years ago by Mikael Huss 4.8k

score 2 · Answer 1 · 2013-07-02

The authors of cufflinks and cuffdiff "recently" changed the normalization method. From a plain FPKM they shifted to the method already used by DESeq (which is thought to be better). Such method returns a value that has a (slight) dependence on the whole library composition. So even if you just remove one sample you could have differences. If you want to check you can run cuffdiff with the option --library-norm-method classic-fpkm.

You can find more on cufflinks/cuffdiff web page: http://cufflinks.cbcb.umd.edu/manual.html#library_norm_meth