While reading about molecular subtyping strategies for various cancers, I have come across many papers talk about specific signatures that correlate to particular disease statuses and are defined by a collection of microarray probes.
For example, this paper defines an "EMT signature."
My question is, what is the exact nature of these signatures? Are they specific groups of probes and expected intensities for each of those probes? (btw, extra points to anyone who can direct me to the specific probe set that makes up the EMT signature, I couldn't find it in the text or the supplementals anywhere!)
I often see papers compare their microarray data to a given signature, and describe the process as simply calculating gene expression signature scores using averaged expression data, or average log intensities, etc. I would really like to be able to definitively define the EMT signature and learn how to compare my microarrays to that signature to determine if they fit.
Any help in this endeavor is much appreciated!
UPDATE: I was able to find in another paper a list of up-regulated and down-regulated genes that I guess define the EMT signature. Is that all a signature is? Anyway, I now need to be able to screen my microarray against this signature and statistically report whether it is a match, and I'm not sure how to proceed. One paper, Cristescu et al., describes doing this: "We calculated the gene expression signature scores using the average of log intensity (also known as the geometric average) of expression of genes in the signature." I want to replicate this method, but don't know what it is really saying.
In the same paper, the authors later go on to explain that they used the EMT signature and another signature (called the MSI signature) to classify some microarrays. They explain, "The distribution tails of MSI and EMT signatures exhibit a mutually exclusive pattern and thus identify the groups of sasmples in the MSI and EMT groups, respectively." Whatever they did here is what I want to do, since I am looking to classify my samples in the same manner.