Question: How to deal with the CNV file and NOCNV file measured by Affy SNP6.0 of level 3 data in TCGA?
0
gravatar for dingzijian.thu
4.4 years ago by
China
dingzijian.thu0 wrote:

Hello all ! 

I have one question about how to utilize the CNV level 3 data measured by Affy whole-genome SNP6.0 array. 

After I mapped all genes (annotated by UCSC Refseq: refFlat) to the cnv.seg file and no.cnv file, I found that genes appear to be in both regions in the two files, for example:

MSRB1 gene(one isoform) is on chromosome 6,the cnv of this region of this gene,is recorded in both cnv and nocnv files:

The column header line of both file is as follows:

Sample  Chromosome      Start   End     Num_Probes      Segment_Mean

 

IN the cnv.seg file:

FRUIT_p_TCGAb_327_328_329_NSP_GenomeWideSNP_6_A06_1367948       6       149661  18225301        14060   1.045

 

IN the nocnv.seg file:

FRUIT_p_TCGAb_327_328_329_NSP_GenomeWideSNP_6_A06_1367948       6       1014281 18225301        11991   1.044

 

We can see that the the segmentation mean score is almost the same, and the two regions are overlapped!

 

It is supposed that the CNVs in the nocnv file is de-noised since the they frequently appear in the normal samples kept by TCGA and resources in the batch of broad institute, as described in the tangent normalization part of this pipeline:

http://www.broadinstitute.org/cancer/software/genepattern/modules/snp6copynumberpipeline

 

So how should we determine the CNV segmentation score of the MSRB1 gene, as 1.045 or 1.044? Or "NA" as it in the segment that seems to be de-noised.

 

I really need everybody's help! Thank you very much.

cnv tcga gene • 5.0k views
ADD COMMENTlink modified 3.7 years ago by Xinsen Xu30 • written 4.4 years ago by dingzijian.thu0
1
gravatar for bounlu
3.9 years ago by
bounlu170
Singapore
bounlu170 wrote:

cnv.seg file includes both germline and somatic CNVs, whereas nocnv.seg file includes only somatic CNVs. so in your case MSRB1 gene undergoes somatic CNV, since it is retained in the nocnv.seg file.

nocnv.seg file, however, cannot be obtained by just removing the regions present in cnv.seg file, since the segments are recalculated (I am not sure about how this was done though). so it is normal that you get slightly different segment mean between the two files, as the segment has changed from one file to another, which means the segment is broken down and the part which is germline is removed. in your case, the part from 149661 to 1014281 is removed as it is germline, and the part from 1014281 to 18225301 is retained as it is somatic, which still included your gene. in this case, you should take 1.044 as the segment mean corresponding to your gene, if you are interested in the somatic alteration that your gene undergoes.

ADD COMMENTlink written 3.9 years ago by bounlu170

Hello bounlu, I was wondering: if, instead of somatic CNVs, I am interested in germline CNVs, then in the example above, would I take 1.045 as the segment mean for germline CNV 149661 to 1014281? In general, this is how I would get germline CNV info from CNV level 3 data? By removing somatic CNV sequences (sequences found only in nocnv.seg files) from somatic+germline sequences (sequences foung in cnv.seg files)?

ADD REPLYlink written 2.8 years ago by georgiaskar0
0
gravatar for Xinsen Xu
3.7 years ago by
Xinsen Xu30
US/Boston/Harvard
Xinsen Xu30 wrote:

Hi, this is really a nice topic. I'm also starting analyzing the cnv level-3 data. May I ask that how did you map the genes to the cnv.seg file?

I've got this data, and I want to annotate the gene names to this file, so that I could know copy number changes of some important genes. I see that you mentioned about the "annotated by UCSC Refseq: refFlat", however, I don't know any about it. Could you talk a little more about how to perform it?

Thanks so much.

 

Sample    Chromosome    Start    End    Num_Probes    Segment_Mean
TCGA-3M-AB46-01A-11D-A40Z-01    1    3218610    9104465    3536    0.3108
TCGA-3M-AB46-01A-11D-A40Z-01    1    9112661    18443992    4501    0.7259
TCGA-3M-AB46-01A-11D-A40Z-01    1    18444009    24463116    3487    0.2912
TCGA-3M-AB46-01A-11D-A40Z-01    1    24464705    32041750    3578    -0.2615
TCGA-3M-AB46-01A-11D-A40Z-01    1    32049394    120523955    52536    0.0888
TCGA-3M-AB46-01A-11D-A40Z-01    1    120527361    197082208    28725    0.4574
TCGA-3M-AB46-01A-11D-A40Z-01    1    197086110    214546519    11254    0.0608
TCGA-3M-AB46-01A-11D-A40Z-01    1    214547938    247813706    21186    0.4461
TCGA-3M-AB46-01A-11D-A40Z-01    2    484222    58250717    33696    0.0731
TCGA-3M-AB46-01A-11D-A40Z-01    2    58251785    58362414    59    0.4514

 

ADD COMMENTlink written 3.7 years ago by Xinsen Xu30

look at the GenomicRanges bioconductor package. https://bioconductor.org/packages/release/bioc/html/GenomicRanges.html

read this paper as well http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003118

ADD REPLYlink written 3.1 years ago by Ming Tang2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1464 users visited in the last hour