Question: Cuffdiff De Significance Of Zero Fpkm Values
6.8 years ago by
Jason10 wrote:


I was using cuffdiff 1.3.0 to compare four BAMs aligned with TopHat 1.4.1. I have noticed that zero FPKM values are never considered significant when compared with large FPKM values. Here is one example that I am concerned about:

XLOC_017834    XLOC_017834    Olfr1307    2:111784672-111785611    FF    FMC    OK    0.257545    46.5301    7.4972    -4.50075    6.77156e-06    0.000658507    yes
XLOC_017834    XLOC_017834    Olfr1307    2:111784672-111785611    FFC    FMC    OK    0    46.5301    1.79769e+308    1.79769e+308    0.0789649    0.365761    no
XLOC_017834    XLOC_017834    Olfr1307    2:111784672-111785611    FM    FMC    OK    0.0380371    46.5301    10.2565    -5.37179    7.79602e-08    2.39776e-05    yes

If line wrapping destroys the above genesexp.diff output, here is a brief summary: FMC has an FPKM of 46.5301 and when compared to FF (FPKM=0.257545) and FM (FPKM=0.0380371) the q-value is below the FDR of 0.05. Appropriately the significance value is labeled as 'yes'. However, FFC has an FPKM of zero and this differential test is not significant. The goal of the experiment was to identify uniquely expressed genes only identified in the FMC data set. As such an infinite log(foldchange) seems more significant than the other comparisons. Can anyone explain this?

HI Jason, 

Did you find a way to deal with this problem. I am running into the same problem:  A lot of genes in my cuffdiff output that have 0 FPKM in one condition and very high FPKM values in the other condition. 

Thank you


Hi Jason and Molla Linda,

I am facing the same issue. How did you handle it?

6.8 years ago by
United States
seidel6.8k wrote:

How does one measure the significance of something that wasn't measured? I think what you're asking for is the authors of cuffdiff to make up a heuristic, whereby if something is measured with significance in one sample (high FPKM), but not measured at all in another, they should invent a significance value and flag that gene as significantly differentially expressed - a ratio of infinity, is after all, pretty significant. The difficulty is that there's an absence of data for one side of the comparison, thus one can't effectively assign any numerical significance to that particular comparison. To do so, would be making something up (the friendlier term is: creating a heuristic). Getting people to agree on a heuristic or a convention for this situation may be difficult.

If you are looking for genes that meet this criteria, there's no reason why you can't make up your own heuristic for identifying them and pulling them out of the cuffdiff results. Even then, you'll have to make some decision about what FPKM value in a given set of samples would be significantly different from 0 in another.

