Question: TCGA CNV Files
5
gravatar for dirigible2012
4.6 years ago by
dirigible2012310
European Union
dirigible2012310 wrote:

Hello all,

I am looking at the Level 3 CNV files on TCGA - the ones generated using SNP microarrays. I have a few questions:

1. How is 'segment mean' calculated and what is the exact biological interpretation?

2. For each patient I have two files called e.g. ....hg19.seg.txt and nocnv_hg19.seg.txt. What does each file contain, and which should I be using?

Thanks for any help,

Stephanie

cnv microarray tcga • 7.7k views
ADD COMMENTlink modified 4.6 years ago by Cyriac Kandoth5.2k • written 4.6 years ago by dirigible2012310
8
gravatar for Cyriac Kandoth
4.6 years ago by
Cyriac Kandoth5.2k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.2k wrote:

Here is the best documentation (that I know of) for TCGA SNP-array based CNV data. Regd. your two questions:

1. CBS segmentation algorithm identifies regions in the genome that, in spite of noise, probably have a uniform underlying copy number. The "segment mean" of each region is reported in the level 3 file, and can be used as the estimated CN-ratio for the segment.

2. "nocnv" just means that germline CN variations are removed. In TCGA, they ran SNP arrays on normal tissue too.

ADD COMMENTlink modified 2.6 years ago • written 4.6 years ago by Cyriac Kandoth5.2k
2

Thanks, that's great. Thank you for the great primer on MAF files as well.

ADD REPLYlink written 4.6 years ago by dirigible2012310

You're very welcome. :)

ADD REPLYlink written 4.6 years ago by Cyriac Kandoth5.2k

Hi Cyriac,

Where are there normal samples listed as "nocnv" under Level 3 data? If they are germline CN filtered then should'nt only the tumor samples be listed?

ADD REPLYlink written 2.6 years ago by Hima0

Can you create a new post for your question. And clarify it with an example of the normal samples you see listed as nocnv.

ADD REPLYlink written 2.6 years ago by Cyriac Kandoth5.2k

I'm still a bit confused about the Copy Number ratios. I want to calculate a whole-genome measure of CNV, and I was thinking of taking the total number of bases duplicated or deleted. How would I calculate this from the segment_mean? Is it a direct ratio or a log ratio?

ADD REPLYlink written 4.6 years ago by dirigible2012310

for the total number of bases duplicated or deleted, you don't need to use the copy number ratios (which is given as log2 ratio by the way). simply subtract the coordinates, end-start. instead copy number ratio gives information on how much the segment deviates from the normal, which can be back-log transformed and rounded to estimate the integer copies.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by bounlu170

Hello, I want to know how TCGA get the "nocnv" file. I download "masked copy number" data from TCGA, it says "Masked copy number segments are generated with the same method except that a filtering step is performed that removes Y chromosome and probe sets that were previously indicated to have frequent germline copy-number variation." Is the "masked copy number" file same as "nocnv" file? If not, how should I generate "nocnv" file from "masked copy number" data?

ADD REPLYlink written 5 months ago by Shixiang30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 804 users visited in the last hour