Question

How to deal with the CNV file and NOCNV file measured by Affy SNP6.0 of level 3 data in TCGA?

0

Entering edit mode

9.6 years ago

dingzijian.thu • 0

Hello all!

I have one question about how to utilize the CNV level 3 data measured by Affy whole-genome SNP6.0 array.

After I mapped all genes (annotated by UCSC Refseq: refFlat) to the cnv.seg file and no.cnv file, I found that genes appear to be in both regions in the two files, for example:

MSRB1 gene(one isoform) is on chromosome 6, the cnv of this region of this gene, is recorded in both cnv and nocnv files:

The column header line of both file is as follows:

Sample  Chromosome      Start   End     Num_Probes      Segment_Mean

IN the cnv.seg file:

FRUIT_p_TCGAb_327_328_329_NSP_GenomeWideSNP_6_A06_1367948       6       149661  18225301        14060   1.045

IN the nocnv.seg file:

FRUIT_p_TCGAb_327_328_329_NSP_GenomeWideSNP_6_A06_1367948       6       1014281 18225301        11991   1.044

We can see that the the segmentation mean score is almost the same, and the two regions are overlapped!

It is supposed that the CNVs in the nocnv file is de-noised since the they frequently appear in the normal samples kept by TCGA and resources in the batch of broad institute, as described in the tangent normalization part of this pipeline.

So how should we determine the CNV segmentation score of the MSRB1 gene, as 1.045 or 1.044? Or "NA" as it in the segment that seems to be de-noised.

I really need everybody's help! Thank you very much.

CNV TCGA gene • 7.0k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by dingzijian.thu • 0

score 1 · Answer 1 · 2015-03-27

cnv.seg file includes both germline and somatic CNVs, whereas nocnv.seg file includes only somatic CNVs. so in your case MSRB1 gene undergoes somatic CNV, since it is retained in the nocnv.seg file.

nocnv.seg file, however, cannot be obtained by just removing the regions present in cnv.seg file, since the segments are recalculated (I am not sure about how this was done though). so it is normal that you get slightly different segment mean between the two files, as the segment has changed from one file to another, which means the segment is broken down and the part which is germline is removed. in your case, the part from 149661 to 1014281 is removed as it is germline, and the part from 1014281 to 18225301 is retained as it is somatic, which still included your gene. in this case, you should take 1.044 as the segment mean corresponding to your gene, if you are interested in the somatic alteration that your gene undergoes.

Ram · Answer 2 · 2015-06-22

Hi, this is really a nice topic. I'm also starting analyzing the cnv level-3 data. May I ask that how did you map the genes to the cnv.seg file?

I've got this data, and I want to annotate the gene names to this file, so that I could know copy number changes of some important genes. I see that you mentioned about the "annotated by UCSC Refseq: refFlat", however, I don't know any about it. Could you talk a little more about how to perform it?

Thanks so much.

Sample                           Chromosome    Start        End            Num_Probes    Segment_Mean
TCGA-3M-AB46-01A-11D-A40Z-01     1             3218610      9104465        3536          0.3108
TCGA-3M-AB46-01A-11D-A40Z-01     1             9112661      18443992       4501          0.7259
TCGA-3M-AB46-01A-11D-A40Z-01     1             18444009     24463116       3487          0.2912
TCGA-3M-AB46-01A-11D-A40Z-01     1             24464705     32041750       3578         -0.2615
TCGA-3M-AB46-01A-11D-A40Z-01     1             32049394     120523955      52536         0.0888
TCGA-3M-AB46-01A-11D-A40Z-01     1             120527361    197082208      28725         0.4574
TCGA-3M-AB46-01A-11D-A40Z-01     1             197086110    214546519      11254         0.0608
TCGA-3M-AB46-01A-11D-A40Z-01     1             214547938    247813706      21186         0.4461
TCGA-3M-AB46-01A-11D-A40Z-01     2             484222       58250717       33696         0.0731
TCGA-3M-AB46-01A-11D-A40Z-01     2             58251785     58362414       59            0.4514