TCGA CNV Files
1
5
Entering edit mode
9.6 years ago
dirigible2012 ▴ 320

Hello all,

I am looking at the Level 3 CNV files on TCGA - the ones generated using SNP microarrays. I have a few questions:

  1. How is 'segment mean' calculated and what is the exact biological interpretation?
  2. For each patient I have two files called e.g. ....hg19.seg.txt and ...nocnv_hg19.seg.txt. What does each file contain, and which should I be using?

Thanks for any help,

Stephanie

CNV TCGA microarray • 11k views
ADD COMMENT
8
Entering edit mode
9.6 years ago

Here is the best documentation (that I know of) for TCGA SNP-array based CNV data. Regd. your two questions:

  1. CBS segmentation algorithm identifies regions in the genome that, in spite of noise, probably have a uniform underlying copy number. The "segment mean" of each region is reported in the level 3 file, and can be used as the estimated CN-ratio for the segment.
  2. "nocnv" just means that germline CN variations are removed. In TCGA, they ran SNP arrays on normal tissue too.
ADD COMMENT
2
Entering edit mode

Thanks, that's great. Thank you for the great primer on MAF files as well.

ADD REPLY
0
Entering edit mode

You're very welcome. :)

ADD REPLY
0
Entering edit mode

Hi Cyriac,

Where are there normal samples listed as "nocnv" under Level 3 data? If they are germline CN filtered then should'nt only the tumor samples be listed?

ADD REPLY
0
Entering edit mode

Can you create a new post for your question. And clarify it with an example of the normal samples you see listed as nocnv.

ADD REPLY
0
Entering edit mode

I'm still a bit confused about the Copy Number ratios. I want to calculate a whole-genome measure of CNV, and I was thinking of taking the total number of bases duplicated or deleted. How would I calculate this from the segment_mean? Is it a direct ratio or a log ratio?

ADD REPLY
0
Entering edit mode

for the total number of bases duplicated or deleted, you don't need to use the copy number ratios (which is given as log2 ratio by the way). simply subtract the coordinates, end-start. instead copy number ratio gives information on how much the segment deviates from the normal, which can be back-log transformed and rounded to estimate the integer copies.

ADD REPLY
0
Entering edit mode

Hello, I want to know how TCGA get the "nocnv" file. I download "masked copy number" data from TCGA, it says "Masked copy number segments are generated with the same method except that a filtering step is performed that removes Y chromosome and probe sets that were previously indicated to have frequent germline copy-number variation." Is the "masked copy number" file same as "nocnv" file? If not, how should I generate "nocnv" file from "masked copy number" data?

ADD REPLY

Login before adding your answer.

Traffic: 2367 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6