Question: combining z-scores into a single z-score value
2
gravatar for Star
5 weeks ago by
Star20
Star20 wrote:

Hi all,

I have a list of z-scores (obtained via effect size/std.err of effect size). I want to combine all the z-scores via Stouffer's method but without weights. Till now I have seen "Stouffer.test" implemented in metaseq package but it requires weights along with the z-score. I have tried "sumz" method implemented in metap package which according to my understanding can work without weights as follows:

`sumz(p, weights = NULL)`  -> where p is the vector of values; in my case z-scores

My understanding of Stouffer's method is

sum of all the z-scores/sqrt of total number of samples

when I compared the results of the "sumz" method as described above, it is not the same as the above formula when I calculated it in excel using the following

=SUM(C2:C8219)/SQRT(8218)

My question is : Is there any R function which works like above (sum of all the z-scores/sqrt of total number of samples) so that I can cross check the results? Or I have to do it manually?

Thanks!!!

ADD COMMENTlink written 5 weeks ago by Star20
1

Maybe sum(p)/sqrt(numberOfSamples) ?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by zx87547.1k

what does p stands for here? z-scores?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Star20

You mentioned p is a z-scores vector:

where p is the vector of values; in my case z-scores

sum(p)/sqrt(8218) is exactly the same thing you are trying to do with Excel =SUM(C2:C8219)/SQRT(8218)

ADD REPLYlink written 5 weeks ago by zx87547.1k

Thankyou for the clarification zx8754. I confused "sum(p)" with pvalues instead of zscores. Now that I have combined z-scores, I am confused about how to interpret it? Shall I convert the z-scores into p-values(one or two-tailed?) to see the significance? Apologies if the question seems to be naive. I am pretty new to genetics and statistics.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Star20

Hey, you stated that:

sumz(p, weights = NULL) -> where p is the vector of values; in my case z-scores

So, you have vector, p, that contains Z scores.

The actual sumz() function from metap package expects p-values, not Z-scores.

Description
Combine p-values using the sum z method

Usage
sumz(p, weights = NULL, data = NULL, subset = NULL, na.action = na.fail)
## S3 method for class 'sumz'
print(x, ...)

Arguments
 - p, A vector of significance values
 - weights, A vector of weights
 - data, Optional data frame containing variables
 - subset, Optional vector of logicals to specify a subset of the p-values
 - na.action, A function indicating what should happen when data contains NAs
 - x, An object of class ‘sumz’
 - ..., Other arguments to be passed through

[source: https://cran.r-project.org/web/packages/metap/metap.pdf]

If you have just Z-scores and do not want to consider weights, you can indeed calculate the overall Z-score by STouffer`s method using:

sum of all the z-scores / sqrt of total number of samples
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Kevin Blighe41k

Thanks for the clarification kevin Blighe. I have combined the z-scores using Stouffer's method. But am not sure how to interpret it.

ADD REPLYlink written 5 weeks ago by Star20

In that case, perhaps you should consider why you chose to use the test in the first instance (?). Do you even need to use it?

ADD REPLYlink written 5 weeks ago by Kevin Blighe41k

I am trying to find the differentially expressed genes present within the tissues. So I have the z-scores for all the genes in the tissues. I have combined the z-scores for the genes in each tissue and now I want to see if I could identify or rank the tissues based upon their pathogenicity using the z-scores?

ADD REPLYlink written 5 weeks ago by Star20

Oh, I see, you now have a combined Z-score for each tissue. Are the results what you expected, if you order them high-to-low?

ADD REPLYlink written 5 weeks ago by Kevin Blighe41k

I don't know how to interpret the z-scores? I am assuming that I need to convert z-scores into p-values first and CI first?

ADD REPLYlink written 4 weeks ago by Star20

Hey, oh, then why did you choose that test if you do not know how to interpret the result? You must have seen it in a publication, right? Are you sure that you need to do the analysis that you are doing?

If you feel that you need greater statistical advice, then you could try CrossValidated (StackExchange), which is more aligned toward statistics. Biostars is a broad/general forum for bioinformatics.

ADD REPLYlink written 4 weeks ago by Kevin Blighe41k

Thankyou Kevin for the reply. Actually my main goal is to combine either a z-score or p-value so that a single value could be the representative of a single tissue. I was using Stouffer's method initially which did not give the results as expected because Stouffer's method takes into account the direction of the effect of the gene (positive or negative) which we are not considering at this stage. So now I am exploring some other methods.

ADD REPLYlink written 4 weeks ago by Star20

What if you only consider the genes that have positive Z scores? I think that most define a tissue by what is highly expressed, not by what is not expressed. I find better results this way, too.

ADD REPLYlink written 4 weeks ago by Kevin Blighe41k

Another method you may consider is to simply define a list of genes for each tissue based on Z>2 or Z>3, and then use GSVA to enrich your data against these lists. This will then return 'scores' for the samples in your data for each tissue. As in, it will say by how much each tissue is enriched in your data.

ADD REPLYlink written 4 weeks ago by Kevin Blighe41k

Hi. Kevin. Thanks for the reply. At this stage I am trying to find the differential expression of tissues regardless of high expression or decreased expression of genes (if this makes sense).

In the sentence you used above, what do you mean by "This will then return 'scores' for the samples in your data for each tissue". What does the word "sample" refers to? Can it refer to the "genes" in a particular tissue?

ADD REPLYlink written 29 days ago by Star20

Hey, GSVA will take this data:

           Sample1  Sample2  Sample3
BRCA1      6        4        3
TP53       3        3        2
BRCC3      7        12       8
...        ...      ...      ...

It then compute's an algorithm against:

Signature1
TP53; BRCC3; ...

Signature2
BRCA1; TP53

GSVA will then return:

             Sample1  Sample2  Sample3
Signature1   3.4      12.6     8.3
Signature2   2.7      10.4     5.5
ADD REPLYlink modified 29 days ago • written 29 days ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1729 users visited in the last hour