Question: TCGA/Broad Institute CNV Files Segment Mean
gravatar for dirigible2012
4.8 years ago by
European Union
dirigible2012310 wrote:

Hello everybody,

I am trying to analyse CNV data from TCGA to get a measure of overall CNV per patient.

When I download the Level 3 files taken from the SNP6 array, there is a column in the file called Segment_Mean. (Example at bottom.)

What do the numbers in this column represent?

I think they might be log ratios, but the link below makes me wonder if they are direct estimates of copy number. (In which case it is puzzling that they aren't whole numbers.)

Thanks for any help,


Sample    Chromosome    Start    End    Num_Probes    Segment_Mean

BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    151040529    153927851    1558    0.2031

BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    153928595    153929981    2    -2.0772

BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    153933585    164456865    7473    0.1883



cnv tcga • 9.1k views
ADD COMMENTlink modified 2.0 years ago by kingsire0 • written 4.8 years ago by dirigible2012310
gravatar for Chris Miller
4.8 years ago by
Chris Miller20k
Washington University in St. Louis, MO
Chris Miller20k wrote:

Those are the log2 ratio of the tumor intensity to the normal intensity. To convert to an absolute cn, use:  (2^seg_mean)*2

ADD COMMENTlink written 4.8 years ago by Chris Miller20k

Thanks for the information.

May I ask that why should we also "*2 (multiply by 2)" in the "(2^seg_mean)*2", instead of just "2^seg_mean"? Dose this "2" represent the normal intensity?

ADD REPLYlink written 4.0 years ago by Xinsen Xu30

Right - the assumption is that the normal genome is diploid. 

ADD REPLYlink written 4.0 years ago by Chris Miller20k

If they are truly log2 ratios of the tumor CN to the normal CNs, how can it be that I see the following in the TCGA ACC cohort?

                         sample chromosome start      end num_probes segment_mean
1: TCGA-2H-A9GF-01A-11D-A37B-01          1 61735 15024591       7600       0.0713
2: TCGA-2H-A9GF-11A-11D-A37E-01          1 61735 17217907       8841       0.0124

The second line is supposed to represent the matched healthy normal (11A denotes healthy normal tissue) of the same donor as the first line. Per your definition, shouldn't this line indicate 0? Against what is this sample compared to compute the segment_mean compared here?

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Maarten Slagter90

You'll have to consult the metadata or description to see exactly how the files you're consulting were generated. There are ways of doing CN calling against a reference pool of samples as well. There may also be other files in that dump of data that contain the matched T/N data.

ADD REPLYlink written 3.5 years ago by Chris Miller20k

I had the same question....

ADD REPLYlink written 2.3 years ago by Ming Tang2.5k
gravatar for Zayni1234
2.1 years ago by
Zayni12340 wrote:

I have a question , if you please can reply : to convert BUBBY_p_TCGA_b89_105_SNP_N_GenomeWideSNP_6_D10_777410 > TCGA-2H-A9GF-01A-11D-A37B-01 we have to do it manually before running GISTIC? thanks

ADD COMMENTlink written 2.1 years ago by Zayni12340
gravatar for kingsire
2.0 years ago by
kingsire0 wrote:

I am also wondering how you converted sample ID such as "FLOUT_p_TCGAb60_SNP_N_GenomeWideSNP_6_C05_681024" to TCGA barcode ID such as "TCGA-2H-A9GF-01A-11D-A37B-01" which is essencial for the next analysis. could you please tell your way to solve this? thanks a lot

ADD COMMENTlink written 2.0 years ago by kingsire0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 597 users visited in the last hour