Question: NaN in Z.score data from TCGA
0
olive121260 wrote:

I have downloaded RNA-Seq expression z.scores from TCGA datasets on cBP. For some genes, they have a z.score value of NaN. Does that mean that the expression level of that gene was 0? Or does it mean something else? I couldn't seem to find the answer online, thanks for your help!

zscore tcga • 339 views
written 10 months ago by olive121260
3
German.M.Demidov1.8k wrote:

by default z-score is centering and then dividing with the standard deviation. My guess would be that standard deviation was 0. It occurs when the expession level is 0, but may also occur in other situations, however, everything except 0 looks unrealistic.

2

Indeed, a value of 0 can be transformed, on the Z-scale, to anything, as 0 is still useful information. If we run a test and calculate Z-scores by global mean and standard dev.:

``````x
col1 col2
[1,]    0  435
[2,]    5  346
[3,]    4   65
[4,]    4    3

(x - mean(x)) / sd(x)
col1       col2
[1,] -0.6073070  1.8444661
[2,] -0.5791257  1.3428389
[3,] -0.5847620 -0.2409501
[4,] -0.5847620 -0.5903982
``````

As kuckunniwid implies, there are other reasons why NaN was produced, likely constant expression values / zero variance. Here, we are going to Z-transform by row in a case where row 1 is all zeros, while row 4 has constant expression of 4:

``````x
col1 col2
[1,]    0    0
[2,]    5  346
[3,]    4   65
[4,]    4    4

t(scale(t(x)))
col1      col2
[1,]        NaN       NaN
[2,] -0.7071068 0.7071068
[3,] -0.7071068 0.7071068
[4,]        NaN       NaN
``````