My way:

Question

GISTIC 2.0 all threshold file versus genome browser

0

Entering edit mode

5.9 years ago

sara.younes ▴ 20

According to GISTIC2.0 output on TCGA data "all_threshold by gene file gene TRNT1 on chr3 for case 2857 has value -1 which means it has loss or deletion when I opened file TCGA_AB_2857_03A_01D_0756_21.nocnv_hg19.seg which is a segment file supposed to report the somatic CNV for this case i found that there are 3 segments reported for ch3

Sample  Chromosome  Start   End Num_Probes  Segment_Mean
TCGA_AB_2857_03A_01D_0756_21    3   2212571 128168037   70409   -0.0077
TCGA_AB_2857_03A_01D_0756_21    3   128169862   128937080   309 -0.2677
TCGA_AB_2857_03A_01D_0756_21    3   128945573   197538677   35554   0.053

TRNT1 is in 3,168,600-3,190,706 and this maps to the first segment but the log ratio is -0.0077 which is really small so I dont know why GISTIC reports it as -1 for this gene or does it report -1 for the whole chromosome and not the focal event in this case?

CNV GISTIC2.0 • 6.5k views

ADD COMMENT • link updated 5.6 years ago by achernia ▴ 30 • written 5.9 years ago by sara.younes ▴ 20

score 2 · Answer 1 · 2018-06-08

2

Entering edit mode

5.9 years ago

Kevin Blighe 87k

The values in the Segment_Mean column are indeed the log2 ratios for tumour versus normal. If you revert the -0.0077 value back to the linear scale, you get a value of ~1:

2^-0.0077 = 0.994677

That does indicate that there is minimal difference between tumour and normal in this segment.

From where exactly did you obtain the value of -1?

I'd like to point out that your gene comprises only a very very very small percentage of that region, ~0.0174%. So, even if the gene was deleted, the effect could be drowned out due to its small representation in the broader segment.

A good idea would be to determine copy number alterations on the TCGA data again by obtaining the aligned BAM files and using those (restricted / controlled access only). Many good copy number programs have now come out since the TCGA data that can generate copy number profiles from NGS data. Copy number in the TCGA was originally determined by Affymetrix SNP 6.0 array, which has good coverage genome-wide, but obviously certain genes may have minimal representation from the array's probes. Such technology is better for identifying very large chromosomal aberrations as opposed to smaller CNAs. However, from NGS data, detecting gene-level CNAs is certainly possible.

Kevin

ADD COMMENT • link 4.0 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks for your reply. I got -1 from the output of GISTIC 2.0 which is called ""all_threshold by gene" it is supposed to be A gene-level table of discrete amplification and deletion indicators at for all samples. If you open the genome browser and query the coordinates of the first segment "TCGA_AB_2857_03A_01D_0756_21 3 2212571 128168037 70409 -0.0077" TRNT1 is only part of the first segment and this segment is "barely" deleted. However, the GISTIC 2.0 program reports in the "all_threshold by gene" that this gene has a discretized value of -1 which means that it is below the low deletion threshold meaning it is deleted. My question was does this file report the focal deletions of genes or it just gives a -1 or -2 value to all the genes that are found in a region that was deleted?

ADD REPLY • link 5.9 years ago by sara.younes ▴ 20

0

Entering edit mode

I am not familiar with that output file and it's neither listed in the documentation (?) - ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm

Also, to which Genome Browser are you referring?

Column 8 of one of the output files should indicate whether it is due to focal or broad change:

Output Files

All Lesions File (all_lesions.conf_XX.txt, where XX is the confidence level)

The all lesions file summarizes the results from the GISTIC run. It contains data about the significant regions of amplification and deletion as well as which samples are amplified or deleted in each of these regions. The identified regions are listed down the first column, and the samples are listed across the first row, starting in column 10.

Region Data

Columns 1-9 present the data about the significant regions as follows:

(1) Unique Name: A name assigned to identify the region

(2) Descriptor: The genomic descriptor of that region.

(3) Wide Peak Limits: The “wide peak” boundaries most likely to contain the targeted genes. These are listed in genomic coordinates and marker (or probe) indices.

(4) Peak Limits: The boundaries of the region of maximal amplification or deletion.

(5) Region Limits: The boundaries of the entire significant region of amplification or deletion.

(6) q-values: The q-value of the peak region.

(7) Residual q-values: The q-value of the peak region after removing (“peeling off”) amplifications or deletions that overlap other, more significant peak regions in the same chromosome.

(8) Broad or Focal: Identifies whether the region reaches significance due primarily to broad events (called “broad”), focal events (called “focal”), or independently significant broad and focal events (called “both”).

(9) Amplitude Threshold: Key giving the meaning of values in the subsequent columns associated with each sample.

ADD REPLY • link 5.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Sorry Kevin

here says https://gatkforums.broadinstitute.org/firecloud/discussion/8254/gistic2-0

GISTIC 2.0 Segment CN

Seg.CN = log2() -1 of copy number

You are mentioning

Segment_Mean = log2(tumour copy number / 2)

Which on is right please?

ADD REPLY • link 5.0 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Hey, how are you?

Both Broad Institute and I are correct, as both formulae produce the same output, as I show in this example:

CN_tumour1 <- 6
CN_tumour2 <- 1

My way:

log2(CN_tumour1 / 2)
[1] 1.584963

log2(CN_tumour2 / 2)
[1] -1

Broad's way:

log2(CN_tumour1) - 1
[1] 1.584963

log2(CN_tumour2) - 1
[1] -1

ADD REPLY • link 4.0 years ago by Kevin Blighe 87k

0

Entering edit mode

Sorry @Kevin Blighe does makes any difference if instead of Segment_Mean = log2(tumour copy number / 2) I use Segment_Mean = log2(tumour copy number / average ploidy of each sample)

ADD REPLY • link 4.1 years ago by zizigolu ★ 4.3k

score 1 · Answer 2 · 2018-10-01

1

Entering edit mode

5.6 years ago

achernia ▴ 30

If you still have questions about GISTIC output please send questions directly to us at the Broad Institute.

Andrew Cherniack achernia@broadinstitute,org

ADD COMMENT • link 5.6 years ago by achernia ▴ 30