Question: Statistical question (Deseq, Cuffdis) when one condition is zero?
1
4.5 years ago by
manekineko130
Bulgaria
manekineko130 wrote:

Hi,
I always am asking myself and can't answear then you have differential expression and one of the conditions are zero? Most of the time the software get division by zero and discard these results (or if the value is close to zero you get "inf" of something), is this correct?

I mean if on one condition the gene is 0 (not expressed) and in other 1000, you will get it discard it, but in general you can have switching on a gene, or probably, your sequencing, if you have it a bit more deep you will get 10 and 10000, and than will not be discarded, and get a nice log2, upregulation?

So my question is what is the right thing you do in such cases, and what is a biologically relevant/right (not statistically)?

modified 4.5 years ago by dariober10k • written 4.5 years ago by manekineko130

why would u discard it when you see `Inf` to `-Inf` ?

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by geek_y10k

Ok the Inf probably was bad example, but I'm more interested in the main case when one is zero?

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by manekineko130
1

Cuffdiff seems to discard things for no reason. I don't recall DESeq2 or edgeR doing that.

2
4.5 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

A popular strategy to cope with zeros is to add a small number to all counts so that you avoid division by zero and at the same time you don't bias the results (e.g. 1000:0 is reasonably equivalent to 1001:1). Having said that, this is an issue that bugs me sometime when interpreting fold change ratios since small numbers can have a large effect which is not consistent with the biological interpretation. For example, if you add 1 to all your counts you could get log2(1001/1)= 9.97; if instead you add 0.1(biologically the same, I would argue) you get log2(1000.1/0.1)= 13.29, which is a big difference.