Question: How to calculate moderated z-score (MODZ)
2
2.3 years ago by
Gema Sanz70
Karolinska Institutet
Gema Sanz70 wrote:

Hi,

I'm trying to reproduce the calculation of moderated z-score (MODZ) described in PMID: 29195078 for some microarray data. Calculation is described as follows:

Replicate-consensus signatures (MODZ) L1000 experiments are typically done in 3 or more biological replicates. We derive a consensus replicate signature by applying the moderated z-score (MODZ) procedure as follows. First, a pairwise Spearman correlation matrix is computed between the replicate signatures in the space of landmark genes with trivial self-correlations being ignored (set to 0). Then, weights for each replicate are computed as the sum of its correlations to the other replicates, normalized such that all weights sum to 1. Finally, the consensus signature is given by the linear combination of the replicate signatures with the coefficients set to the weights. This procedure serves to mitigate the effects of uncorrelated or outlier replicates, and can be thought of as a ‘de-noised’ representation of the given experiment’s transcriptional consequences.

I already calculated robust z-score for each gene (n=900) at every sample (n=6, 3 controls, 3 treatments) and I used the transposed matrix (genes as columns, samples as rows) to calculate pairwise spearman correlation using:

``````cor<- cor(tm, use="pairwise.complete.obs","spearman")
``````

but I'm sure I'm missing something because I get error

``````Error in cor(tm, use = "pairwise.complete.obs", "spearman") :
'y' must be numeric
``````

I don't know how to define x or y (I'm quite newby with R), but as I understand, I have to correlate control 1 to control 2, control 2 to control 3 and control 1 to control 3 individually and same for treatments and then compute the weights.

Any ideas about the code or how to proceed will be very much welcome!!!

Gema

modz z-score • 1.9k views
modified 13 months ago by Marouen Ben Guebila0 • written 2.3 years ago by Gema Sanz70
2

You could try:

``````cor <- cor(data.matrix(tm), use="pairwise.complete.obs", "spearman")
``````

Your tm object should only contain numerical values. Most likely it is currently a data-frame object, which is not specifically numerical.

1

Wait, cross-post: C: correlation between genes

1

If you want pairwise sample correlations, then your samples should be columns, with genes as rows

yes it was me, but I wanted to post the whole thing to make things clear about what I need to do

I just get the same error with that code. The only way I found to skip that error and make the code work is to set y as tm, but I don't know the meaning of that

y<- tm cor<- cor(tm, y, use="pairwise.complete.obs","spearman")

Can you display a small version of tm (like first 10 genes)?

2

There is no issue setting y as tm. It just means that it will correlate tm to tm.

If you want gene-to-gene correlations, then:

``````cor(tm, tm, use="pairwise.complete.obs", "spearman")
``````

If you want sample-to-sample correlations, then:

``````cor(t(tm), t(tm), use="pairwise.complete.obs", "spearman")
``````

Sorry, I reached the limit of coments per 6h... this is an example matrix of my data, I have 6 different samples and 928 genes, it is already trasposed

https://imgur.com/a/iZtsm

what I need to get according to the description of the calculation, is a correlation value per gene in each sample, and the output from cor() is only one correlation value per gene pair, I'm not sure how to get that.

Actually, this is a bit weird because the weighted average of zscores isn´t a zscore.

1

Why did you post this as an answer?

1

yes sorry it was just a thought I did not answer the question.

No problem - no need to worry at this point.

3
2.3 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

As just mentioned on your other post, you need `method="spearman"`, since otherwise R is seeing the unnamed argument as the thing you should be correlating `tm` with. Also, you want `t(tm)`, as Kevin mentioned.

1

Good spot on "spearman" Devon.

``````cor(tm, use="pairwise.complete.obs", method="spearman")
``````

or

``````cor(t(tm), use="pairwise.complete.obs", method="spearman")
``````

Thanks! that solved the "y" problem. As I wrote above what I need to get according to the description of the calculation, is a correlation value per gene in each sample, and the output from cor() is only one correlation value per gene pair, I'm not sure how to get that, I guess I should do a correlation using x for one sample and y for another and do it with all possible comparisons... ?

pairwise Spearman correlation matrix is computed between the replicate signatures in the space of landmark genes with trivial self-correlations being ignored (set to 0). Then, weights for each replicate are computed as the sum of its correlations to the other replicates

From the text I understand that I need one correlation value per gene and per sample and then combine those from replicates.

I have no idea how to do that, they have a script in MATLAB to do it, but I don't know how to extrapolate into R. This is the MATLAB code:

``````function modz_ds = level4_to_level5(zsrep_file, col_meta_file, landmark_file, group_var)
% LEVEL4_TO_LEVEL5 Compute Moderated Z-scores (ModZ) from replicate signatures

% zsrep_file = '/Users/narayan/workspace/cmapM/data/TEST_A375_24H_ZSPCINF_n67x22268.gctx';
% col_meta_file = '/Users/narayan/workspace/cmapM/data/TEST_A375_24H_ZSPCINF.map';
% landmark_file = '/Users/narayan/workspace/cmapM/data_processing/resources/L1000_EPSILON.R2.chip';

% z-scores for all replicate signatures
zsrep = parse_gctx(zsrep_file);
% sample annotations
col_meta = parse_tbl(col_meta_file, 'outfmt', 'record');
% landmark annotations
chip = parse_tbl(landmark_file, 'outfmt', 'record');

%% Generate moderated z-score (ModZ) signatures

% Exclude large outlier zscores
zsrep.mat = clip(zsrep.mat, -10, 10);

% Landmark features
pr_id_lm = {chip(strcmp('LM', {chip.pr_type})).pr_id}';
[~, lm_ridx] = intersect(zsrep.rid, pr_id_lm);

% column ids
cid = {col_meta.cid}';

% Group samples
[rep_gp, rep_idx] = getcls({col_meta.(group_var)}');

num_gp = length(rep_gp);
[num_row, num_col] = size(zsrep.mat);
modz_mat = nan(num_row, num_gp);
for ii=1:num_gp
this_gp = rep_idx == ii;
this_zs = zsrep.mat(:, this_gp);
fprintf(1, '%d/%d %s Computing ModZS %d replicates\n', ii, num_gp, rep_gp{ii}, nnz(this_gp));
% determine weights based on replicate correlations in landmark space
[modz_mat(:, ii), wt, cc] = modzs(this_zs, lm_ridx);
fprintf(1, 'Replicate correlations\n');
disp(cc);
fprintf(1, 'Replicate weights: ');
disp(wt');
end
% Annotate ModZ matrix
modz_ds = mkgctstruct(modz_mat, 'rid', zsrep.rid, 'cid', rep_gp);
[~, uidx] = unique(rep_idx, 'stable');
modz_meta = keepfield(col_meta(uidx), {'rna_well', 'pert_id',...
'pert_iname', 'pert_type',...
'cell_id','pert_idose',...
'pert_itime'});
modz_ds = annotate_ds(modz_ds, modz_meta);
%modz_ds = annotate_ds(modz_ds, row_meta, 'dim', 'row', 'keyfield', 'pr_id');
end
``````
1

I like how obtuse their description is. The actual matlab code is here, which in R is:

``````modzs <- function(m, clipLowWt=TRUE, lowThreshWT=0.1, clipLowCC=TRUE, lowThreshCC=0, metric="avg") {
cc = cor(m, method="spearman") # pair-wise correlations
if(clipLowCC) { # Trim low values
cc[cc<lowThreshCC] = lowThreshCC
}

wt = 0.5 * colSums(cc) # Per-sample values
if(clipLowWt) { # Trim low values
wt[wt<lowThreshWT] = lowThreshWT
}

# Normalize the weights
if(metric=="avg") {
sumWT = sum(abs(wt))
} else {
sumWT = sqrt(sum(wt^2))
}
normWT = wt / sumWT  # Normalized weights

# Return the scaled input values
return(m * normWT)
}
``````

I've set the defaults of `modzs()` to match the ones used in matlab. I've not tried this on real data, so I don't have a good sense how reasonable it is.

Note that the input matrix, `m`, has genes as rows and samples as columns.

1

I think it is working! The function doesn't give errors and I got something that I think is the expected output

https://ibb.co/eLtkSc

wow, thanks A LOT!!! this is a start point, I will play around with it! :)

Do you code in MatLab Devon? - I don't!

1

Not since college, which was quite a while back :P