Question

TCGA CNV Files

5

Entering edit mode

9.6 years ago

dirigible2012 ▴ 320

Hello all,

I am looking at the Level 3 CNV files on TCGA - the ones generated using SNP microarrays. I have a few questions:

How is 'segment mean' calculated and what is the exact biological interpretation?
For each patient I have two files called e.g. ....hg19.seg.txt and ...nocnv_hg19.seg.txt. What does each file contain, and which should I be using?

Thanks for any help,

Stephanie

CNV TCGA microarray • 11k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 9.6 years ago by dirigible2012 ▴ 320

Ram · Accepted Answer · 2014-09-03

8

Entering edit mode

9.6 years ago

Cyriac Kandoth 6.0k

Here is the best documentation (that I know of) for TCGA SNP-array based CNV data. Regd. your two questions:

CBS segmentation algorithm identifies regions in the genome that, in spite of noise, probably have a uniform underlying copy number. The "segment mean" of each region is reported in the level 3 file, and can be used as the estimated CN-ratio for the segment.
"nocnv" just means that germline CN variations are removed. In TCGA, they ran SNP arrays on normal tissue too.

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Cyriac Kandoth 6.0k

2

Entering edit mode

Thanks, that's great. Thank you for the great primer on MAF files as well.

ADD REPLY • link 9.6 years ago by dirigible2012 ▴ 320

0

Entering edit mode

You're very welcome. :)

ADD REPLY • link 9.6 years ago by Cyriac Kandoth 6.0k

0

Entering edit mode

Hi Cyriac,

Where are there normal samples listed as "nocnv" under Level 3 data? If they are germline CN filtered then should'nt only the tumor samples be listed?

ADD REPLY • link 7.6 years ago by Hima • 0

0

Entering edit mode

Can you create a new post for your question. And clarify it with an example of the normal samples you see listed as nocnv.

ADD REPLY • link 7.6 years ago by Cyriac Kandoth 6.0k

0

Entering edit mode

I'm still a bit confused about the Copy Number ratios. I want to calculate a whole-genome measure of CNV, and I was thinking of taking the total number of bases duplicated or deleted. How would I calculate this from the segment_mean? Is it a direct ratio or a log ratio?

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by dirigible2012 ▴ 320

0

Entering edit mode

for the total number of bases duplicated or deleted, you don't need to use the copy number ratios (which is given as log2 ratio by the way). simply subtract the coordinates, end-start. instead copy number ratio gives information on how much the segment deviates from the normal, which can be back-log transformed and rounded to estimate the integer copies.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.1 years ago by Ömer An ▴ 260

0

Entering edit mode

Hello, I want to know how TCGA get the "nocnv" file. I download "masked copy number" data from TCGA, it says "Masked copy number segments are generated with the same method except that a filtering step is performed that removes Y chromosome and probe sets that were previously indicated to have frequent germline copy-number variation." Is the "masked copy number" file same as "nocnv" file? If not, how should I generate "nocnv" file from "masked copy number" data?

ADD REPLY • link 5.5 years ago by Shixiang ▴ 100