Question: z score transformation by population or by gene?
0
Pietro100 wrote:

In calculating z-scores for microarray or RNA-Seq data, I have found two main answers on how to obtain them.

For example, in `R`, having a log2 expression matrix `x` with genes in rows and samples in columns, I would do:

``````zscore <- function(x) {
z <- (x - mean(x)) / sd(x)
return(z)
}
``````

But many often suggest to use the `scale` base R function, on the transposed matrix. Like

``````mat_zscore <- t(scale(t(x)))
``````

If I am not wrong, the two approaches are different, that is, in the first one I am subtracting population mean and dividing by population SD, while the second one operates by column by default, so transposing is done to calculate mean and SD for each gene in row.

My question is, is one of the two more correct than the other? And why are both given as valid alternatives?

Thanks

modified 14 months ago by Kevin Blighe63k • written 14 months ago by Pietro100
1
Kevin Blighe63k wrote:

They should give the same values. Here is my proof, taking functions from `pheatmap()` and `heatmap.2()`, and comparing them to `scale()`: cannot replicate the pheatmap scale function

Keep in mind that we also either scale by row or by column. Your function is scaling by the global mean and global standard deviation. In a typical setting for a transcriptomics study, `scale(t(x))` will scale by row.

Kevin

My question was more like: "Is it better to scale by global or by gene mean and SD?"

Can you show an example where global mean and global sdev were used?

How transform FPKM values to Z-score using R

Or you mean an article?

Both answers in that thread are old, and the answers by Seán and dariober are different, as you have also highlighted in your question.

The `scale()` function will always scale by column, only (you can get it to scale by row by doing `t(scale(t(x)))`); so, each column in the data is scaled separately. This may be more favourable in certain situations, e.g., for visualisation. However, I have never seen a comprehensive review of why one would be more favourable over the other. You may receive a better answer by posting on Cross Validated.