Are there any methods to quantify genomic instability in cancer samples based on copy number data (from SNP or aCGH arrays)? Is it possible to compare samples by genomic instability? Any publications, links would be great. Thank you.
It's certainly possible; there are several approaches that vary in statistical sophistication and complexity. I tend to use simple methods myself. In a recent paper, I used the UCSF BAC array system (~2000 probes, triplicates, very reliable) and defined a threshold above/below which a BAC was called "altered". I defined bins for the genome and assigned each bin the value for the closest BAC. I then counted the number of alterations at individual loci, chromosomal regions, genome-wide, etc. This lets you get a simple read on degree of aneuploidy, frequently altered regions, etc. You can describe tumors by percent of the genome altered, look by chromosome, etc.
For copy number assessed by SNP arrays, I use DNACopy to make calls on copy numbers for sets of contiguous loci. It's again a judgement call to determine where a region is amplified; in one data set I was fortunate to have Taqman at a locus of interest and used that as a gold standard. One could then proceed as described above.
An inherent difficulty in this process is the degree of normal DNA contamination (always present to some extent in tumor samples) will affect your results. Several groups have published more sophisticated methods that use information from all of the samples to try to increase accuracy of calls for alterations over simple thresholding schemes. You should have a look at GISTIC (from the Broad), STAC, MSA, or Shah et al to get an idea of how that would work.
Most work I've seen (or done) in this area is looking for loci that are frequently altered in association with some phenotype. However, I recommend looking at Donna Albertson's work in breast cancer associating different alteration profiles with distinct phenotypes (e.g. Fridlyand BMC Cancer 2006, Chin Cancer Cell 2006 etc. Note that I have collaborated with Donna, so that's not a completely impartial citation, but her group did fundamental work in this area long before I met her.
Assuming that ideally what you want is obtaining a single number being representative of tumor genomic instability, I would answer that, to me, it is a hard problem to deal with and that it is almost unfeasible. Simply because genomic instability has several sources which are difficult to merge in a single measure. The links given in David's answer (Fridlyand & al, Chin & al) give a good idea of this diversity in the case of breast cancer.
In the past, I was working on 300 breast cancer samples all hybridized on aCGH Agilent 244K platform. That was a pretty huge set and I was trying to answer the question you posed. I would distinguish 3 components in genomic instability :
- percentage of altered genome
- number of breakpoints (links to the number of events)
- nature of found alterations (amplifications...)
I don't think that those diverse components could be combined in a single measure and they rather represent different kind of instability not necessarily directly comparable. It certainly means that involved biological mechanisms are different. I would rather try to group tumors by kind of instability like it is done in those publications.
These components are difficult to merge, but you also can not omit one of them. Because taken individually these measure are not informative enough. Let's consider the proportion of altered genome : (unrealistic example) if you have a tumor losing chromosomes 1,2,3 entirely resulting in a proportion of ~20% of altered genome and another having many focal gene amplifications, losses and gains with all chromosomes involved and resulting in a total proportion of 15%, would you say that the first tumor is more unstable ? Certainly not.
Number of breakpoints is very interesting, and I found that this was a better 'single' measure. But my experience is that it was difficult to have a reliable measure of it. This number depends too much on the quality of your hybridization and on the sensitivity of your segmentation and "calling gains & losses" algorithms. I used DNAcopy with default parameters which is very sensitive and (with Agilent 244K) it catches almost all local trends in the data thus generating many false positive breakpoints. I did not find at that time a 'merging segments' algorithm that suited my needs. A suggestion could be to exclude tumors of poor hybridization quality or re-hybridize them, but depending on the size of your set and on the money you can invest, it is often not a conceivable solution !
Good luck !