Question: TCGA Segement Mean, GISTIC and CNVs
gravatar for Jimbou
5.7 years ago by
Jimbou800 wrote:


I have questions regarding the CNV calls calculated from TCGA.
What I understand, they used a CBS algorithm to find segments which are changed compared to a reference and the segment mean value is a measure of this change. In general, a mean log2 Ratio of the probe intensities.
Actually, the segments can be defined as deletions or duplications beyond a threshold (defined from you. Severel papers used +/-0.2).

Sample    Chromosome    Start    End    Number_of_probes Segment_Mean
TCGA-CC-A8HV-01    chr1    51598    5999008    100    -0.0325
TCGA-CC-A8HV-01    chr1    6001979    6002289    153   -2.1264
TCGA-CC-A8HV-01    chr1    6002874    14443436    2    -0.0923

Afterwards, TCGA "re"calculated (to enhance?) the CNV detection results in cancer samples using the segmentation data with GISTIC2 . Is this right? 

I compared some of the segment mean data and the results from GISTIC2 (estimates) for cancer samples and found differences on gene and sample level.

If the GISTIC2 method provides better results do I have to use then a similar algorithm for non-cancer healthy samples and germline CNVs? And which are these tools? Can I use GISTIC, as well?



gistic2 affy tcga cnv • 13k views
ADD COMMENTlink modified 5.6 years ago by Bontus80 • written 5.7 years ago by Jimbou800

Hi Jimbou! Were you able to find the answer to your question? I would like to know which tools is used by TCGA to analyze SNP array data for copy number analysis.

ADD REPLYlink written 4.7 years ago by Dataman330
gravatar for Bontus
5.6 years ago by
Bontus80 wrote:

Hi Jimbou,

Struggling with similar questions over here, as the used threshold is very often arbitrarily described in literature without further explanation / reasoning. What I found so far concerning GISTIC is the following (see

"What is GISTIC? What is RAE?

Copy number data sets within the portal are generated by GISTIC or RAE algorithms. Both algorithms attempt to identify significantly altered regions of amplification or deletion across sets of patients. Both algorithms also generate putative gene/patient copy number specific calls, which are then input into the portal.

For TCGA studies, the table in all_thresholded.by_genes.txt (which is the part of the GISTIC output that is used to determine the copy-number status of each gene in each sample in cBioPortal) is obtained by applying both low- and high-level thresholds to to the gene copy levels of all the samples. The entries with value +/- 2 exceed the high-level thresholds for amps/dels, and those with +/- 1 exceed the low-level thresholds but not the high-level thresholds. The low-level thresholds are just the 'amp_thresh' and 'del_thresh' noise threshold input values to GISTIC (typically 0.1 or 0.3) and are the same for every thresholds.

By contrast, the high-level thresholds are calculated on a sample-by-sample basis and are based on the maximum (or minimum) median arm-level amplification (or deletion) copy number found in the sample. The idea, for deletions anyway, is that this level is a good approximation for hemizygous given the purity and ploidy of the sample. The actual cutoffs used for each sample can be found in a table in the output file sample_cutoffs.txt. All GISTIC output files for TCGA are available at:"

Hope this helps, though I did not yet manage to obtain a copy of the 'sample_cutoffs.txt' for my cancer cohort. In case you found any more information please share.


ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Bontus80
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1213 users visited in the last hour