Question: GISTIC 2.0 all threshold file versus genome browser
0
gravatar for sara.younes
5 months ago by
sara.younes20
sara.younes20 wrote:

According to GISTIC2.0 output on TCGA data "all_threshold by gene file gene TRNT1 on chr3 for case 2857 has value -1 which means it has loss or deletion when I opened file TCGA_AB_2857_03A_01D_0756_21.nocnv_hg19.seg which is a segment file supposed to report the somatic CNV for this case i found that there are 3 segments reported for ch3

Sample  Chromosome  Start   End Num_Probes  Segment_Mean
TCGA_AB_2857_03A_01D_0756_21    3   2212571 128168037   70409   -0.0077
TCGA_AB_2857_03A_01D_0756_21    3   128169862   128937080   309 -0.2677
TCGA_AB_2857_03A_01D_0756_21    3   128945573   197538677   35554   0.053

TRNT1 is in 3,168,600-3,190,706 and this maps to the first segment but the log ratio is -0.0077 which is really small so I dont know why GISTIC reports it as -1 for this gene or does it report -1 for the whole chromosome and not the focal event in this case?

gistic2.0 cnv • 568 views
ADD COMMENTlink modified 7 weeks ago by achernia0 • written 5 months ago by sara.younes20
0
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe32k
Republic of Ireland
Kevin Blighe32k wrote:

The values in the Segment_Mean column are indeed the log2 ratios for tumour versus normal. If you revert the 0.0077 value back to the linear scale, you get a value of ~1:

2^0.0077 = 1.00535150169

That does indicate that there is minimal difference between tumour and normal in this segment, as the GISTIC formula is:

Segment_Mean = log2(tumour copy number / 2)

From where exactly did you obtain the value of -1?

I'd like to point out that your gene comprises only a very very very small percentage of that region, ~0.0174%. So, even if the gene was deleted, the effect could be drowned out due to its small representation in the broader segment.

A good idea would be to determine copy number alterations on the TCGA data again by obtaining the aligned BAM files and using those (restricted / controlled access only). Many good copy number programs have now come out since the TCGA data that can generate copy number profiles from NGS data. Copy number in the TCGA was originally determined by Affymetrix SNP 6.0 array, which has good coverage genome-wide, but obviously certain genes may have minimal representation from the array's probes. Such technology is better for identifying very large chromosomal aberrations as opposed to smaller CNAs. However, from NGS data, detecting gene-level CNAs is certainly possible.

Kevin

ADD COMMENTlink modified 5 months ago • written 5 months ago by Kevin Blighe32k

Thanks for your reply. I got -1 from the output of GISTIC 2.0 which is called ""all_threshold by gene" it is supposed to be A gene-level table of discrete amplification and deletion indicators at for all samples. If you open the genome browser and query the coordinates of the first segment "TCGA_AB_2857_03A_01D_0756_21 3 2212571 128168037 70409 -0.0077" TRNT1 is only part of the first segment and this segment is "barely" deleted. However, the GISTIC 2.0 program reports in the "all_threshold by gene" that this gene has a discretized value of -1 which means that it is below the low deletion threshold meaning it is deleted. My question was does this file report the focal deletions of genes or it just gives a -1 or -2 value to all the genes that are found in a region that was deleted?

ADD REPLYlink written 5 months ago by sara.younes20

I am not familiar with that output file and it's neither listed in the documentation (?) - ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm

Also, to which Genome Browser are you referring?

Column 8 of one of the output files should indicate whether it is due to focal or broad change:

Output Files

  1. All Lesions File (all_lesions.conf_XX.txt, where XX is the confidence level)

The all lesions file summarizes the results from the GISTIC run. It contains data about the significant regions of amplification and deletion as well as which samples are amplified or deleted in each of these regions. The identified regions are listed down the first column, and the samples are listed across the first row, starting in column 10.

Region Data

Columns 1-9 present the data about the significant regions as follows:

(1) Unique Name: A name assigned to identify the region

(2) Descriptor: The genomic descriptor of that region.

(3) Wide Peak Limits: The “wide peak” boundaries most likely to contain the targeted genes. These are listed in genomic coordinates and marker (or probe) indices.

(4) Peak Limits: The boundaries of the region of maximal amplification or deletion.

(5) Region Limits: The boundaries of the entire significant region of amplification or deletion.

(6) q-values: The q-value of the peak region.

(7) Residual q-values: The q-value of the peak region after removing (“peeling off”) amplifications or deletions that overlap other, more significant peak regions in the same chromosome.

(8) Broad or Focal: Identifies whether the region reaches significance due primarily to broad events (called “broad”), focal events (called “focal”), or independently significant broad and focal events (called “both”).

(9) Amplitude Threshold: Key giving the meaning of values in the subsequent columns associated with each sample.

ADD REPLYlink written 5 months ago by Kevin Blighe32k
0
gravatar for achernia
7 weeks ago by
achernia0
achernia0 wrote:

If you still have questions about GISTIC output please send questions directly to us at the Broad Institute.

Andrew Cherniack achernia@broadinstitute,org

ADD COMMENTlink written 7 weeks ago by achernia0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1628 users visited in the last hour