Question: How to calculate moderated z-score (MODZ)
0
gravatar for Gema Sanz
4 months ago by
Gema Sanz50
Karolinska Institutet
Gema Sanz50 wrote:

Hi,

I'm trying to reproduce the calculation of moderated z-score (MODZ) described in PMID: 29195078 for some microarray data. Calculation is described as follows:

Replicate-consensus signatures (MODZ) L1000 experiments are typically done in 3 or more biological replicates. We derive a consensus replicate signature by applying the moderated z-score (MODZ) procedure as follows. First, a pairwise Spearman correlation matrix is computed between the replicate signatures in the space of landmark genes with trivial self-correlations being ignored (set to 0). Then, weights for each replicate are computed as the sum of its correlations to the other replicates, normalized such that all weights sum to 1. Finally, the consensus signature is given by the linear combination of the replicate signatures with the coefficients set to the weights. This procedure serves to mitigate the effects of uncorrelated or outlier replicates, and can be thought of as a ‘de-noised’ representation of the given experiment’s transcriptional consequences.

I already calculated robust z-score for each gene (n=900) at every sample (n=6, 3 controls, 3 treatments) and I used the transposed matrix (genes as columns, samples as rows) to calculate pairwise spearman correlation using:

cor<- cor(tm, use="pairwise.complete.obs","spearman")

but I'm sure I'm missing something because I get error

Error in cor(tm, use = "pairwise.complete.obs", "spearman") : 
  'y' must be numeric

I don't know how to define x or y (I'm quite newby with R), but as I understand, I have to correlate control 1 to control 2, control 2 to control 3 and control 1 to control 3 individually and same for treatments and then compute the weights.

Any ideas about the code or how to proceed will be very much welcome!!!

Gema

modz z-score • 310 views
ADD COMMENTlink modified 4 months ago by Devon Ryan81k • written 4 months ago by Gema Sanz50
2
gravatar for Kevin Blighe
4 months ago by
Kevin Blighe21k
University College London Cancer Institute
Kevin Blighe21k wrote:

You could try:

cor <- cor(data.matrix(tm), use="pairwise.complete.obs", "spearman")

Your tm object should only contain numerical values. Most likely it is currently a data-frame object, which is not specifically numerical.

ADD COMMENTlink written 4 months ago by Kevin Blighe21k
1

Wait, cross-post: C: correlation between genes

ADD REPLYlink written 4 months ago by Kevin Blighe21k
1

If you want pairwise sample correlations, then your samples should be columns, with genes as rows

ADD REPLYlink written 4 months ago by Kevin Blighe21k

yes it was me, but I wanted to post the whole thing to make things clear about what I need to do

ADD REPLYlink written 4 months ago by Gema Sanz50

I just get the same error with that code. The only way I found to skip that error and make the code work is to set y as tm, but I don't know the meaning of that

y<- tm cor<- cor(tm, y, use="pairwise.complete.obs","spearman")

ADD REPLYlink written 4 months ago by Gema Sanz50

Can you display a small version of tm (like first 10 genes)?

ADD REPLYlink written 4 months ago by Kevin Blighe21k
2

There is no issue setting y as tm. It just means that it will correlate tm to tm.

If you want gene-to-gene correlations, then:

cor(tm, tm, use="pairwise.complete.obs", "spearman")

If you want sample-to-sample correlations, then:

cor(t(tm), t(tm), use="pairwise.complete.obs", "spearman")

Take a look also at my parallel processing version of this: https://github.com/kevinblighe/clusGapKB (go to the end of the page)

ADD REPLYlink written 4 months ago by Kevin Blighe21k

Sorry, I reached the limit of coments per 6h... this is an example matrix of my data, I have 6 different samples and 928 genes, it is already trasposed

https://imgur.com/a/iZtsm

what I need to get according to the description of the calculation, is a correlation value per gene in each sample, and the output from cor() is only one correlation value per gene pair, I'm not sure how to get that.

ADD REPLYlink modified 4 months ago • written 4 months ago by Gema Sanz50
2
gravatar for Devon Ryan
4 months ago by
Devon Ryan81k
Freiburg, Germany
Devon Ryan81k wrote:

As just mentioned on your other post, you need method="spearman", since otherwise R is seeing the unnamed argument as the thing you should be correlating tm with. Also, you want t(tm), as Kevin mentioned.

ADD COMMENTlink written 4 months ago by Devon Ryan81k
1

Good spot on "spearman" Devon.

cor(tm, use="pairwise.complete.obs", method="spearman")

or

cor(t(tm), use="pairwise.complete.obs", method="spearman")
ADD REPLYlink written 4 months ago by Kevin Blighe21k

Thanks! that solved the "y" problem. As I wrote above what I need to get according to the description of the calculation, is a correlation value per gene in each sample, and the output from cor() is only one correlation value per gene pair, I'm not sure how to get that, I guess I should do a correlation using x for one sample and y for another and do it with all possible comparisons... ?

pairwise Spearman correlation matrix is computed between the replicate signatures in the space of landmark genes with trivial self-correlations being ignored (set to 0). Then, weights for each replicate are computed as the sum of its correlations to the other replicates

From the text I understand that I need one correlation value per gene and per sample and then combine those from replicates.

I have no idea how to do that, they have a script in MATLAB to do it, but I don't know how to extrapolate into R. This is the MATLAB code:

function modz_ds = level4_to_level5(zsrep_file, col_meta_file, landmark_file, group_var)
% LEVEL4_TO_LEVEL5 Compute Moderated Z-scores (ModZ) from replicate signatures

% zsrep_file = '/Users/narayan/workspace/cmapM/data/TEST_A375_24H_ZSPCINF_n67x22268.gctx';
% col_meta_file = '/Users/narayan/workspace/cmapM/data/TEST_A375_24H_ZSPCINF.map';
% landmark_file = '/Users/narayan/workspace/cmapM/data_processing/resources/L1000_EPSILON.R2.chip';

% z-scores for all replicate signatures
zsrep = parse_gctx(zsrep_file);
% sample annotations
col_meta = parse_tbl(col_meta_file, 'outfmt', 'record');
% landmark annotations
chip = parse_tbl(landmark_file, 'outfmt', 'record');

%% Generate moderated z-score (ModZ) signatures

% Exclude large outlier zscores
zsrep.mat = clip(zsrep.mat, -10, 10);

% Landmark features
pr_id_lm = {chip(strcmp('LM', {chip.pr_type})).pr_id}';
[~, lm_ridx] = intersect(zsrep.rid, pr_id_lm);

% column ids
cid = {col_meta.cid}';

% Group samples
[rep_gp, rep_idx] = getcls({col_meta.(group_var)}');

num_gp = length(rep_gp);
[num_row, num_col] = size(zsrep.mat);
modz_mat = nan(num_row, num_gp);
for ii=1:num_gp
    this_gp = rep_idx == ii;
    this_zs = zsrep.mat(:, this_gp);    
    fprintf(1, '%d/%d %s Computing ModZS %d replicates\n', ii, num_gp, rep_gp{ii}, nnz(this_gp));
    % determine weights based on replicate correlations in landmark space
    [modz_mat(:, ii), wt, cc] = modzs(this_zs, lm_ridx);          
    fprintf(1, 'Replicate correlations\n');
    disp(cc);
    fprintf(1, 'Replicate weights: ');
    disp(wt');
end
% Annotate ModZ matrix
modz_ds = mkgctstruct(modz_mat, 'rid', zsrep.rid, 'cid', rep_gp);
[~, uidx] = unique(rep_idx, 'stable');
modz_meta = keepfield(col_meta(uidx), {'rna_well', 'pert_id',...
                                       'pert_iname', 'pert_type',...
                                       'cell_id','pert_idose',...
                                       'pert_itime'});
modz_ds = annotate_ds(modz_ds, modz_meta);
%modz_ds = annotate_ds(modz_ds, row_meta, 'dim', 'row', 'keyfield', 'pr_id');
end
ADD REPLYlink written 4 months ago by Gema Sanz50
1

I like how obtuse their description is. The actual matlab code is here, which in R is:

modzs <- function(m, clipLowWt=TRUE, lowThreshWT=0.1, clipLowCC=TRUE, lowThreshCC=0, metric="avg") {
    cc = cor(m, method="spearman") # pair-wise correlations
    if(clipLowCC) { # Trim low values
        cc[cc<lowThreshCC] = lowThreshCC
    }

    wt = 0.5 * colSums(cc) # Per-sample values
    if(clipLowWt) { # Trim low values
        wt[wt<lowThreshWT] = lowThreshWT
    }

    # Normalize the weights
    if(metric=="avg") {
        sumWT = sum(abs(wt))
    } else {
        sumWT = sqrt(sum(wt^2))
    }
    normWT = wt / sumWT  # Normalized weights

    # Return the scaled input values
    return(m * normWT)
}

I've set the defaults of modzs() to match the ones used in matlab. I've not tried this on real data, so I don't have a good sense how reasonable it is.

Note that the input matrix, m, has genes as rows and samples as columns.

ADD REPLYlink modified 4 months ago • written 4 months ago by Devon Ryan81k
1

I think it is working! The function doesn't give errors and I got something that I think is the expected output

https://ibb.co/eLtkSc

ADD REPLYlink written 4 months ago by Gema Sanz50

wow, thanks A LOT!!! this is a start point, I will play around with it! :)

ADD REPLYlink written 4 months ago by Gema Sanz50

Do you code in MatLab Devon? - I don't!

ADD REPLYlink written 4 months ago by Kevin Blighe21k
1

Not since college, which was quite a while back :P

ADD REPLYlink written 4 months ago by Devon Ryan81k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 639 users visited in the last hour