NaN in Z.score data from TCGA
1
0
Entering edit mode
3.1 years ago
JJDollar ▴ 130

I have downloaded RNA-Seq expression z.scores from TCGA datasets on cBP. For some genes, they have a z.score value of NaN. Does that mean that the expression level of that gene was 0? Or does it mean something else? I couldn't seem to find the answer online, thanks for your help!

tcga zscore • 1.7k views
ADD COMMENT
3
Entering edit mode
3.1 years ago

by default z-score is centering and then dividing with the standard deviation. My guess would be that standard deviation was 0. It occurs when the expession level is 0, but may also occur in other situations, however, everything except 0 looks unrealistic.

ADD COMMENT
2
Entering edit mode

Indeed, a value of 0 can be transformed, on the Z-scale, to anything, as 0 is still useful information. If we run a test and calculate Z-scores by global mean and standard dev.:

x
     col1 col2
[1,]    0  435
[2,]    5  346
[3,]    4   65
[4,]    4    3

(x - mean(x)) / sd(x)
           col1       col2
[1,] -0.6073070  1.8444661
[2,] -0.5791257  1.3428389
[3,] -0.5847620 -0.2409501
[4,] -0.5847620 -0.5903982

As kuckunniwid implies, there are other reasons why NaN was produced, likely constant expression values / zero variance. Here, we are going to Z-transform by row in a case where row 1 is all zeros, while row 4 has constant expression of 4:

x
     col1 col2
[1,]    0    0
[2,]    5  346
[3,]    4   65
[4,]    4    4

t(scale(t(x)))
           col1      col2
[1,]        NaN       NaN
[2,] -0.7071068 0.7071068
[3,] -0.7071068 0.7071068
[4,]        NaN       NaN
ADD REPLY

Login before adding your answer.

Traffic: 1898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6