Question

Gene expression normalization sample-wise or feature-wise? which one is the recommended way?

0

Entering edit mode

14 months ago

tyasird ▴ 10

Dear Biostars users,

I would like to ask question about z-score normalization (standardization) on gene-expression data.
As you can aware from the title, I would like to ask which one is the good way to normalize gene expression data?

If I check examples for gene-expression data on the internet usually people use sample-wise normalization, however, when I check the examples on the machine-learning examples or any other examples people usually use feature-wise normalization.

I wonder what is the clear difference between these two methods?

so lets say we have DF like this;

      sample_0  sample_1 sample_2 sample_3
gene0   5.1 3.5 1.4 0.2  
gene1   4.9 3.0 1.4 0.2
gene2   4.7 3.2 1.3 0.2
gene3   4.6 3.1 1.5 0.2
gene4   5.0 3.6 1.4 0.2
... ... ... ... ...
gene145 6.7 3.0 5.2 2.3
gene146 6.3 2.5 5.0 1.9
gene147 6.5 3.0 5.2 2.0
gene148 6.2 3.4 5.4 2.3
gene149 5.9 3.0 5.1 1.8

This is the sample-wise z-score normalization (calculate mean of each sample and subtract from data)

        sample_0    sample_1    sample_2    sample_3
gene0   -0.900681   1.019004    -1.340227   -1.315444
gene1   -1.143017   -0.131979   -1.340227   -1.315444
gene2   -1.385353   0.328414    -1.397064   -1.315444
gene3   -1.506521   0.098217    -1.283389   -1.315444
gene4   -1.021849   1.249201    -1.340227   -1.315444
... ... ... ... ...
gene145 1.038005    -0.131979   0.819596    1.448832
gene146 0.553333    -1.282963   0.705921    0.922303
gene147 0.795669    -0.131979   0.819596    1.053935
gene148 0.432165    0.788808    0.933271    1.448832
gene149 0.068662    -0.131979   0.762758    0.79067

and this is the feature-wise z-score normalization (calculate mean of each feature(gene) and subtract from data)

       sample_0         sample_1    sample_2    sample_3
gene0   1.351023    0.503322    -0.609285   -1.245060
gene1   1.431365    0.354298    -0.552705   -1.232958
gene2   1.358472    0.491362    -0.606977   -1.242858
gene3   1.358655    0.452885    -0.513270   -1.298270
gene4   1.311925    0.562254    -0.615801   -1.258377
... ... ... ... ...
gene145 1.370869    -0.742554   0.514076    -1.142391
gene146 1.321102    -0.792661   0.597972    -1.126413
gene147 1.311682    -0.662893   0.578268    -1.227057
gene148 1.208577    -0.596232   0.692918    -1.305264
gene149 1.195060    -0.582209   0.704779    -1.317631

150 rows × 4 columns

normalization gene-expression • 636 views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 14 months ago by tyasird ▴ 10

1

Entering edit mode

I think you need to be clear on the difference between normalisation and standardization. z-score is not a good way to normalise gene expression data. However, it can be useful in some circumstances to standardize already normalised data. There is no one recommended way (or even whether to do standardisation at all), and it depends on the purpose of your analysis.

ADD REPLY • link 14 months ago by i.sudbery 19k

0

Entering edit mode

Side note: The word is subtract, not substract. - there's no s in the middle. I've corrected the word in your post.

ADD REPLY • link 14 months ago by Ram 43k