Question

Have the copy numbers (or segment means) in the "minus_germline" TCGA datasets already been compared to normal tissue data?

0

Entering edit mode

6.9 years ago

bddesanctis • 0

First of all, you can find the dataset I am referring to by going to http://gdac.broadinstitute.org/, clicking "Browse" under the Data column (for any dataset, but I am using HNSC), and downloading the file genome_wide_snp_6segmented_scna_minus_germline_cnv_hg19 under the heading "SNP6 CopyNum" in the pop-up that comes up. In this data, there is a column for segment means, which you can transform to copy numbers by doing 2*2^(segment mean).

I am looking to find regions of the genome with copy numbers that are amplified or deleted in cancer, when compared to normal tissue. I was under the impression that "minus germline" meant the data had already been standardized against normal tissue, so I have averaged the tumour (TP, aka "01A") samples over the entire genome. However, for most of the genome, this didn't result in many significant departures from a copy number of 2. Have I done something wrong? More precisely:

Has this data already been standardized against normal tissue? i.e. if a tumour sample has a copy number bigger than 2, does this mean it is an amplification in comparison to normal tissue?
If not, how would I go about standardizing the values? Most of the tumour samples have matching normal samples, but I am unsure whether to subtract segment means, take a ratio of segment means, subtract copy numbers, or take a ratio of copy numbers. Different sources have done different things.

I realize there is also the matter of the difference in amplification/deletion threshold (usually .2 and -.2), etc., and comments and suggestions here are appreciated as well, but my primary question is about the format of the data.

Thank you!

copy number tcga cnv • 4.1k views

ADD COMMENT • link updated 6.9 years ago by Persistent LABS ▴ 750 • written 6.9 years ago by bddesanctis • 0

score 1 · Accepted Answer · 2017-06-12

TCGA uses a normal reference set for tangent normalization of raw intensity values followed by segmentation using CBS algorithm. These reference normal set are different from matched normal (Blood derived sample from the patient) as well as Normal samples (derived from healthy individual). Find more here

Tumor samples, their corresponding matched-normal samples (Blood derived Normal) as well as Normal sample (Solid Tissue Normal), all are normalized against reference sample set.

Tumor samples normalized w.r.t. Reference set
Matched Normal samples normalized w.r.t. Reference set
Normal samples normalized w.r.t. Reference set

Since the seg mean values are already normalized (log2 ratio), you can go ahead with raw values. No need to do tumor - normal.

Masked copy number segments file are same as unmased file except a filtering step is performed that removes Y chromosome and probe sets that were previously indicated to have frequent germline copy-number variation (Probes are removed, thus segmentation mean would change).

Hope it clears.