Question: How to deal with the CNV file and NOCNV file measured by Affy SNP6.0 of level 3 data in TCGA?
gravatar for dingzijian.thu
6.1 years ago by
dingzijian.thu0 wrote:

Hello all ! 

I have one question about how to utilize the CNV level 3 data measured by Affy whole-genome SNP6.0 array. 

After I mapped all genes (annotated by UCSC Refseq: refFlat) to the cnv.seg file and no.cnv file, I found that genes appear to be in both regions in the two files, for example:

MSRB1 gene(one isoform) is on chromosome 6,the cnv of this region of this gene,is recorded in both cnv and nocnv files:

The column header line of both file is as follows:

Sample  Chromosome      Start   End     Num_Probes      Segment_Mean


IN the cnv.seg file:

FRUIT_p_TCGAb_327_328_329_NSP_GenomeWideSNP_6_A06_1367948       6       149661  18225301        14060   1.045


IN the nocnv.seg file:

FRUIT_p_TCGAb_327_328_329_NSP_GenomeWideSNP_6_A06_1367948       6       1014281 18225301        11991   1.044


We can see that the the segmentation mean score is almost the same, and the two regions are overlapped!


It is supposed that the CNVs in the nocnv file is de-noised since the they frequently appear in the normal samples kept by TCGA and resources in the batch of broad institute, as described in the tangent normalization part of this pipeline:


So how should we determine the CNV segmentation score of the MSRB1 gene, as 1.045 or 1.044? Or "NA" as it in the segment that seems to be de-noised.


I really need everybody's help! Thank you very much.

cnv tcga gene • 5.8k views
ADD COMMENTlink modified 5.4 years ago by Xinsen Xu30 • written 6.1 years ago by dingzijian.thu0
gravatar for Ömer An
5.6 years ago by
Ömer An200
Ömer An200 wrote:

cnv.seg file includes both germline and somatic CNVs, whereas nocnv.seg file includes only somatic CNVs. so in your case MSRB1 gene undergoes somatic CNV, since it is retained in the nocnv.seg file.

nocnv.seg file, however, cannot be obtained by just removing the regions present in cnv.seg file, since the segments are recalculated (I am not sure about how this was done though). so it is normal that you get slightly different segment mean between the two files, as the segment has changed from one file to another, which means the segment is broken down and the part which is germline is removed. in your case, the part from 149661 to 1014281 is removed as it is germline, and the part from 1014281 to 18225301 is retained as it is somatic, which still included your gene. in this case, you should take 1.044 as the segment mean corresponding to your gene, if you are interested in the somatic alteration that your gene undergoes.

ADD COMMENTlink written 5.6 years ago by Ömer An200

Hello bounlu, I was wondering: if, instead of somatic CNVs, I am interested in germline CNVs, then in the example above, would I take 1.045 as the segment mean for germline CNV 149661 to 1014281? In general, this is how I would get germline CNV info from CNV level 3 data? By removing somatic CNV sequences (sequences found only in nocnv.seg files) from somatic+germline sequences (sequences foung in cnv.seg files)?

ADD REPLYlink written 4.5 years ago by georgiaskar0
gravatar for Xinsen Xu
5.4 years ago by
Xinsen Xu30
Xinsen Xu30 wrote:

Hi, this is really a nice topic. I'm also starting analyzing the cnv level-3 data. May I ask that how did you map the genes to the cnv.seg file?

I've got this data, and I want to annotate the gene names to this file, so that I could know copy number changes of some important genes. I see that you mentioned about the "annotated by UCSC Refseq: refFlat", however, I don't know any about it. Could you talk a little more about how to perform it?

Thanks so much.

Sample                           Chromosome    Start        End            Num_Probes    Segment_Mean
TCGA-3M-AB46-01A-11D-A40Z-01     1             3218610      9104465        3536          0.3108
TCGA-3M-AB46-01A-11D-A40Z-01     1             9112661      18443992       4501          0.7259
TCGA-3M-AB46-01A-11D-A40Z-01     1             18444009     24463116       3487          0.2912
TCGA-3M-AB46-01A-11D-A40Z-01     1             24464705     32041750       3578         -0.2615
TCGA-3M-AB46-01A-11D-A40Z-01     1             32049394     120523955      52536         0.0888
TCGA-3M-AB46-01A-11D-A40Z-01     1             120527361    197082208      28725         0.4574
TCGA-3M-AB46-01A-11D-A40Z-01     1             197086110    214546519      11254         0.0608
TCGA-3M-AB46-01A-11D-A40Z-01     1             214547938    247813706      21186         0.4461
TCGA-3M-AB46-01A-11D-A40Z-01     2             484222       58250717       33696         0.0731
TCGA-3M-AB46-01A-11D-A40Z-01     2             58251785     58362414       59            0.4514
ADD COMMENTlink modified 10 months ago by RamRS30k • written 5.4 years ago by Xinsen Xu30

look at the GenomicRanges bioconductor package.

read this paper as well

ADD REPLYlink modified 10 months ago by RamRS30k • written 4.8 years ago by Ming Tang2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1246 users visited in the last hour