Question: How to combine multiple genes expression for survival analysis ?
0
gravatar for gokce.ouz
2.6 years ago by
gokce.ouz40
Singapore
gokce.ouz40 wrote:

Hi All,

We have combined 4 GEO datasets, removed batch effect using ComBAT, and extracted the genes we are interested in. Each gene has multiple probes. However, 1 cohort missing the 3 probes( but they have different probes for those genes). We would like to do survival analysis by combining multiple genes. Our aim is to compare survival of high expressed vs low expressed.However, we have multiple questions:

Genes   A1  A2  A3  B1  B2  B3  C1  C2  C3
Batch1  NA  6.1 7.6 5.0 4.4 NA  6.4 6.4 NA
Batch2  5.9 5.9 8.3 5.2 5.1 5.1 6.7 6.3 6.3
Batch3  6.4 6.4 8.2 5.1 5.3 5.3 6.7 6.7 6.7
Batch4  5.6 7.1 6.3 6.3 8.1 6.5 5.4 6.0 4.9
  1. Should we combine probes of same genes? If yes, which way do you suggest : Average, median, or others ?
  2. When we are combining the probes what should we do the missing probes? Should we totally exclude A1, B3 &C3 from our analysis or for Batch 2,3,4 : combine A1,A2.A3 & for Batch 1: combine A2, A3 ?
  3. After combining the probes, we would like to see the 3 combined genes effect on survival so to get their combined expression is it ok to use Avg(A,B,C) +1/2 SD ? or what do you suggest ?
  4. As a next step, how should we define the threshold for high/ low expression ? Is using Z score on the combined 3 gene expression is ok to set the threshold? 0 will be the base & negative values defines low expression, whereas high values define high expression of the combined genes?

Thanks in advance,

Gokce

microarray survival analysis • 1.4k views
ADD COMMENTlink modified 18 months ago by Kevin Blighe41k • written 2.6 years ago by gokce.ouz40
0
gravatar for Kevin Blighe
18 months ago by
Kevin Blighe41k
Guy's Hospital, London
Kevin Blighe41k wrote:

Hey Gokce,

There are no real answers for your questions because there are no standards set in relation to what you are asking.

I presume that this is microarray data and that you have normalised it by RMA or gcRMA.

1) Should we combine probes of same genes? If yes, which way do you suggest : Average, median, or others?

There is no answer. Some people will favour getting the average/mean, whilst others prefer the median. Both have their own advantages and disadvantages, and are both open to criticism. In this situation, I don't see any problem in obtaining the mean. My logic is that, considering you will have already performed normalisation, highly variable probes will have already been managed and possibly excluded during normalisation.

2) When we are combining the probes what should we do the missing probes? Should we totally exclude A1, B3 &C3 from our analysis or for Batch 2,3,4 : combine A1,A2.A3 & for Batch 1: combine A2, A3?

If only 1 of 3 values is missing, then just get the mean of the 2 probes for which you do have values.

3) After combining the probes, we would like to see the 3 combined genes effect on survival so to get their combined expression is it ok to use Avg(A,B,C) +1/2 SD ? or what do you suggest?

I'm not understanding your question, particularly why you would want to add 1/2 SD?

4) As a next step, how should we define the threshold for high/ low expression ? Is using Z score on the combined 3 gene expression is ok to set the threshold? 0 will be the base & negative values defines low expression, whereas high values define high expression of the combined genes?

I would start by dividing the expression range for each gene into tertiles, as follows:

  • lower tertile = low expression
  • middle tertile = normal expression
  • upper tertile = higher expression

Thus, you will have 3 lines in your survival curve.

My main worry for your data actually relates to the mention of having correct for batch. How can you be sure that you have adequately corrected for this? Batch correction is an area that comes under question time and time again, and one must be sure that one i not just introducing further bias/confounding information in the attempt to correct for batch. Have you done a PCA analysis to gauge the correction?

Hope that this helps

Kevin

ADD COMMENTlink modified 18 months ago • written 18 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 908 users visited in the last hour