Question

How to interpret "pseudo count" in gene expression data handling context?

4

Entering edit mode

4.0 years ago

n,n ▴ 360

I am reading a paper that has a passage describing the pre-processing of gene expression data before conducting the experiments. The passage states "After conversion to a base-2 logarithm with a pseudo count of 0.125, batch normalization using ComBat was applied".

What exactly is a pseudo count? What I understood initially was that you add 0.125 to every value in your gene expression matrix and then take the logarithm of that to avoid taking the logarithm of 0 (which is not defined). This is based on my intuition though and I would like to know if this is correct and if there are other reasons why pseudo counts are used.

RNA-Seq normalization • 6.1k views

ADD COMMENT • link updated 4.0 years ago by ATpoint 82k • written 4.0 years ago by n,n ▴ 360

score 6 · Accepted Answer · 2020-05-02

6

Entering edit mode

4.0 years ago

dsull ★ 5.8k

Your understanding is correct.

I personally like log2(x+1) because a pseudocount of 1 means you don't have to deal with negative numbers.

ADD COMMENT • link 4.0 years ago by dsull ★ 5.8k