Question: calculation of Z-score from Isoform expression data
0
2.4 years ago by
sumithra.das10
sumithra.das10 wrote:

Dear all,

I'm using level 3 RSEM normalized data from GDAC Firehose,

I want to compare the expression of an isoform across tumor samples of an cancer by calculating Z-score. so, my question is, to calculate Z-score of an Isoform (X), what should be the reference population & how to calculate the mean & std.dev of that reference population. Like, should i consider the average & std.dev of all isoforms in the library or average & std.dev of only isoform(X) in all tumor samples.

which formula to be considered ?

Z_Score = (expression of X in a tumor sample (s)) - (mean expression of X from all tumor samples (population) / (standard deviation of X of all tumor samples)

or

Z_Score = (expression of X in a tumor sample (s)) - (mean expression of all isoforms (+73K) from all tumor samples(population) / (standard deviation of all isoforms of tumor samples)

Thanks!

sumithra

rna-seq • 909 views
written 2.4 years ago by sumithra.das10

Why don't you use `scale` function in R?

http://stat.ethz.ch/R-manual/R-devel/library/base/html/scale.html

sorry but how does scaling and centering help in Z-score calculation. plz explain.

thanks

Z-score = scaling and centering

thank you, but for scaling and centering what should be my reference population??

What do you mean exactly with reference population?

The scale function works per column. Per column a mean is calculated and the standard deviation, all data from that column is corrected with (or better scaled to) these values to z-scores.

If you want row z-scores, you'll have to transpose your matrix first

``````transposed_matrix <- t(matrix)
``````

thanks for that, the data i'm working is within sample normalized, so for comparing a single isoform (X) expression across different samples i wanted to calculate Z-score. To do so, should i consider only the mean and standard deviation of X or mean and standard deviation of all isoforms( +73K isoforms) of all the samples

Sorry but I don't exactly understand how your data looks like. What does +73K isoforms mean? More than 73.000 isoforms? Why do you want to include this in your z-scores?