1. CBS segmentation algorithm identifies regions in the genome that, in spite of noise, probably have a uniform underlying copy number. The "segment mean" of each region is reported in the level 3 file, and can be used as the estimated CN-ratio for the segment.

2. "nocnv" just means that germline CN variations are removed. In TCGA, they ran SNP arrays on normal tissue too.

I'm still a bit confused about the Copy Number ratios. I want to calculate a whole-genome measure of CNV, and I was thinking of taking the total number of bases duplicated or deleted. How would I calculate this from the segment_mean? Is it a direct ratio or a log ratio?

for the total number of bases duplicated or deleted, you don't need to use the copy number ratios (which is given as log2 ratio by the way). simply subtract the coordinates, end-start. instead copy number ratio gives information on how much the segment deviates from the normal, which can be back-log transformed and rounded to estimate the integer copies.

Hello, I want to know how TCGA get the "nocnv" file. I download "masked copy number" data from TCGA, it says "Masked copy number segments are generated with the same method except that a filtering step is performed that removes Y chromosome and probe sets that were previously indicated to have frequent germline copy-number variation." Is the "masked copy number" file same as "nocnv" file? If not, how should I generate "nocnv" file from "masked copy number" data?

Thanks, that's great. Thank you for the great primer on MAF files as well.

You're very welcome. :)

Hi Cyriac,

Where are there normal samples listed as "nocnv" under Level 3 data? If they are germline CN filtered then should'nt only the tumor samples be listed?

Can you create a new post for your question. And clarify it with an example of the normal samples you see listed as nocnv.

I'm still a bit confused about the Copy Number ratios. I want to calculate a whole-genome measure of CNV, and I was thinking of taking the total number of bases duplicated or deleted. How would I calculate this from the

`segment_mean`

? Is it a direct ratio or a log ratio?for the total number of bases duplicated or deleted, you don't need to use the copy number ratios (which is given as log2 ratio by the way). simply subtract the coordinates, end-start. instead copy number ratio gives information on how much the segment deviates from the normal, which can be back-log transformed and rounded to estimate the integer copies.

Hello, I want to know how TCGA get the "nocnv" file. I download "masked copy number" data from TCGA, it says "Masked copy number segments are generated with the same method except that a filtering step is performed that removes Y chromosome and probe sets that were previously indicated to have frequent germline copy-number variation." Is the "masked copy number" file same as "nocnv" file? If not, how should I generate "nocnv" file from "masked copy number" data?