Question: Analysis With Cufflinks (Cuffdiff) Of Genes With No Gene Expression In One Of The Samples.
Hi all,

I'm using Cufflinks-Cuffdiff to get the differentially expressed genes in two samples, the thing is for some genes where their expression level in one of the two samples is 0 I get extreme (weird) values of the log2(fold change) (I guess as a result of either dividing by 0 or calculating the log2 of 0). What do you usually do with these genes? do you discard them or do you keep them for downstream analysis?

Here you are an example of what I mean (the columns would be gene name, FPKM in sample 1, FPKM in sample 2, log2(fold change)

OR4F16    0    0.00245667    1.79769e+308

EDITED: Would you trust on the p-value obtained in these cases?



I think this situation requires case-specific analysis. Log2(0) is -Inf, so the fold-change value is unhelpful but not unexpected. In some cases the gene is genuinely expressed only in one condition. That is unusual but could be extremely interesting, so I would not arbitrarily dump these genes. However, if your samples say "not expressed and just barely only a tiny little bit expressed" then I think it is like that the sample showing extremely modest expression is a false positive. Cases where one condition shows robust expression and the other shows lack of expression may be better true positive candidates. Cases where replicates show the same behavior are much more interesting. Look at the reads in IGV or a pileup. Beware of confounding biases that co-vary with your differential expression analysis.

Thanks @David for your opinion. I completely agree with you that these genes should not be dumped and that they need special analysis. Unfortunately we don't have replicates so I'm afraid we would have to analyze each gene one by one

