TCGA/Broad Institute CNV Files Segment Mean
3
2
Entering edit mode
7.7 years ago
dirigible2012 ▴ 320

Hello everybody,

I am trying to analyse CNV data from TCGA to get a measure of overall CNV per patient.

When I download the Level 3 files taken from the SNP6 array, there is a column in the file called Segment_Mean. (Example at bottom.)

What do the numbers in this column represent?

I think they might be log ratios, but the link below makes me wonder if they are direct estimates of copy number. (In which case it is puzzling that they aren't whole numbers.)

Thanks for any help,

Stephanie

Sample    Chromosome    Start    End    Num_Probes    Segment_Mean
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    151040529    153927851    1558    0.2031
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    153928595    153929981    2    -2.0772
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    153933585    164456865    7473    0.1883

tcga cnv • 12k views
10
Entering edit mode
7.7 years ago

Those are the log2 ratio of the tumor intensity to the normal intensity. To convert to an absolute cn, use: (2^seg_mean)*2

0
Entering edit mode

Thanks for the information.

May I ask that why should we also "*2 (multiply by 2)" in the "(2^seg_mean)*2", instead of just "2^seg_mean"? Dose this "2" represent the normal intensity?

0
Entering edit mode

Right - the assumption is that the normal genome is diploid.

0
Entering edit mode

If they are truly log2 ratios of the tumor CN to the normal CNs, how can it be that I see the following in the TCGA ACC cohort?

                         sample chromosome start      end num_probes segment_mean
1: TCGA-2H-A9GF-01A-11D-A37B-01          1 61735 15024591       7600       0.0713
2: TCGA-2H-A9GF-11A-11D-A37E-01          1 61735 17217907       8841       0.0124


The second line is supposed to represent the matched healthy normal (11A denotes healthy normal tissue) of the same donor as the first line. Per your definition, shouldn't this line indicate 0? Against what is this sample compared to compute the segment_mean compared here?

0
Entering edit mode

You'll have to consult the metadata or description to see exactly how the files you're consulting were generated. There are ways of doing CN calling against a reference pool of samples as well. There may also be other files in that dump of data that contain the matched T/N data.

0
Entering edit mode

0
Entering edit mode

Could you please tell me how to transform such segment files into gene-level copy number variation files？

0
Entering edit mode
5.1 years ago
Zayni1234 • 0

to convert BUBBY_p_TCGA_b89_105_SNP_N_GenomeWideSNP_6_D10_777410 > TCGA-2H-A9GF-01A-11D-A37B-01

we have to do it manually before running GISTIC?

thanks

0
Entering edit mode
5.0 years ago
kingsire • 0

I am also wondering how you converted sample ID such as FLOUT_p_TCGAb60_SNP_N_GenomeWideSNP_6_C05_681024 to TCGA barcode ID such as TCGA-2H-A9GF-01A-11D-A37B-01 which is essential for the next analysis. could you please tell your way to solve this? thanks a lot