21 months ago by
University College London
You are referring to a post that I made. From where did you obtain the original data? - Broad Firebrowse (somatic copy number alterations) or just downloaded the original files from GDC?
If you followed the data processing exactly as follows:
- Part I - download segmented sCNA data for any TCGA cohort from Broad Institute's FireBrowse server and identify recurrent sCNA regions in these with GAIA
- Part II - plot recurrent sCNA gains and losses from GAIA
- Part III - annotate the recurrent sCNA regions (this post, just below)
- Part IV -
generate heatmap of recurrent sCNA regions over your cohort
Then, the statistically significant recurrent somatic copy number alterations (sCNA) are held in the *.igv.gistic files. You can extract statistically significant regions from this file and then pull out the original copy number over these on a per sample basis using GenomicRanges - the copy number that you take is indeed the segment mean from the original copy number program that was used (in the case of TCGA data, likely DNAcopy (R)).
If you do that, then you can build a matrix of:
- statistically significant recurrent sCNAs in a group of patients as
- patients as columns
- Segment Mean over each region as the values
With that, I generated this and identified clusters of patients based on recurrent sCNA via Partitioning Around Medoids (PAM)::
Of course, you don't have to use that data, exactly, but you really have to know to what your data relates.