based on a collaboration project, we have identified a small gene signature, discriminating relatively well control samples from tumor ones, based on an initial microarray analysis. The more important thing is that these genes, are also correlated with specific clinical imaging variables. Thus, our next step is to validate in independent TCGA RNA-Seq dataset of the same cancer, the prognostic impact of these genes-as in these TCGA datasets we dont have the clinical variables information-and especially if the correlated signature, could identify patient groups that have different survival estimates.
Thus, my major question is the following: as this signature is relatively small: 22 genes, 10 up-reg in tumors, and the 12 down-reg (based on the microarray DE analysis), how i should proceed or cluster the tumor samples based on this signature in the RNA-Seq independent dataset ? i should not take into account the normal samples included ? and focus only on the tumors ?
For example, for each gene, create a median expression score across all tumors, then separate the tumor samples into high and low ? And then, create for the total signature an average score ? Or this is biased, as i do not take into account the expression direction from the microarray analysis ? and i should proceed with something more robust ?
My main notion, is not to perform survival analysis for each separately-rather, if possible, identify based on the whole signature, groups of tumor samples, that are separated based on groups of these genes.
Thank you in advance,