Question

Obtaining the z-score in only tumor RNAseq data

0

Entering edit mode

4.0 years ago

deathmagnetic20 ▴ 10

Hi Biostars community,

I am currently doing survival and other analyses in RNAseq gene expression data using the ACC TCGA database.

How can i change the code for obtaining the z-score applying this formula?

z = [(value gene X in tumor Y)-(mean gene X in tumor)]/(standard deviation X in tumor)

since ACC data has no normal values, so no n_index. I solved voom transformation but I am stucked with this.

    scal <- function(x,y){
  mean_n <- rowMeans(y)  # mean of normal
  sd_n <- apply(y,1,sd)  # SD of normal
  # z score as (value - mean normal)/SD normal
  res <- matrix(nrow=nrow(x), ncol=ncol(x))
  colnames(res) <- colnames(x)
  rownames(res) <- rownames(x)
  for(i in 1:dim(x)[1]){
    for(j in 1:dim(x)[2]){
      res[i,j] <- (x[i,j]-mean_n[i])/sd_n[i]
    }
  }
  return(res)
}
z_rna <- scal(rna_vm[,t_index],rna_vm[,n_index])

Can I just t(scale(t(rna_vm)))? I guess Its not the solution...

Sorry for the naive question. Im kinda new in R and bioinformatics. Thank you so much, Alex

RNA-Seq rna-seq R gene • 1.8k views

ADD COMMENT • link updated 4.0 years ago by Kevin Blighe 87k • written 4.0 years ago by deathmagnetic20 ▴ 10

score 1 · Answer 1 · 2020-04-16

1

Entering edit mode

4.0 years ago

Kevin Blighe 87k

Yes, this..

z = [(value gene X in tumor Y)-(mean gene X in tumor)]/(standard deviation X in tumor)

..is the same as:

t(scale(t(rna_vm)))

..assuming that, for your object rna_vm, the genes are rows, and samples are columns.

---------

However, if we wanted to obtain the 'global' Z-score, then it would be different. The calculation for global Z score would be:

(rna_vm - mean(rna_vm, na.rm = TRUE)) / sd(rna_vm, na.rm = TRUE)

Please check the output of ?scale in order to understand what it's doing, specifically.

Kevin

ADD COMMENT • link 4.0 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you very much Kevin, your reply has been very helpful to me. I have been doing some research and I still don't know why one approach would be more favorable rather than the other one. Scaling by row shows me very low expression values, so I tried by global and I get a median z-score of my gene of interest closer to the median of my gene in other databases (with normal samples), so I will keep global as a better approach..

Alex

ADD REPLY • link 4.0 years ago by deathmagnetic20 ▴ 10

1

Entering edit mode

I think that we usually 'scale per gene' (i.e., the t(scale(t(x))) example) for visualisation purposes, e.g., heatmaps, or where we are doing univariate analysis, like survival analysis with gene expression.

The global Z-score makes a lot more sense in other scenarios, obviously.

ADD REPLY • link 4.0 years ago by Kevin Blighe 87k

1

Entering edit mode

I agree. The problem is that scaling per gene extremely stratify the samples (after create an event vector) over the median expression gene of interest value, so almost the entire samples are "high expression". I'm not completely sure if global z-score is a bad practice for survival analysis with gene expression.

ADD REPLY • link 4.0 years ago by deathmagnetic20 ▴ 10