Question: How To Calculate Degree Of Deletion And Amplification Of Cnv Given Snp Array Data From Tcga?
gravatar for jessada115
6.0 years ago by
jessada11540 wrote:

I have CNV SNP array from TCGA that looks like

Sample    Chromosome    Start    End    Num_Probes    Segment_Mean
Sample1    1    61735    757469    46    0.5909
Sample1    1    757923    12852748    6470    -0.1666
Sample1    1    12857863    13776072    94    0.2141
Sample1    1    13776828    16149915    1792    -0.1672
Sample1    1    16153497    16155010    10    1.1636
Sample1    1    16165661    17012422    355    -0.1473
Sample1    1    17012456    17247727    81    0.1974
Sample1    1    17247845    25583341    5292    -0.1525
Sample1    1    25593128    25611452    14    -2.5747

and I'd like to convert it into the format that looks like

Gene1    0.2729
Gene2    -0.5803
Gene3    0.9857

In the result, '0.2729', '-0.5803', and '0.9857' are the degree of deletion and amplification. And 'Gene1', 'Gene2', 'Gene3' should be named according to HUGO standard.

Where can I find the tools that can do this kind of annotation?

gene tcga cnv snp • 5.8k views
ADD COMMENTlink modified 5.9 years ago by Yamol40 • written 6.0 years ago by jessada11540
gravatar for Yamol
5.9 years ago by
Yamol40 wrote:

what do Num_Probes and Segment_Mean stand for?

ADD COMMENTlink written 5.9 years ago by Yamol40

The number of consecutive probes that comprise that segment, and the mean value of thosr probes. See the documentation for the R DNAcopy package for more details.

ADD REPLYlink written 5.9 years ago by Chris Miller21k

Do you mean that the Segment_Mean stand for "log2(Detected Number/2)"? So for the numbers that > 0, they are amplification and <0, they are deletion? The Num_Probes seems that there's no need to use it for CNV.

ADD REPLYlink written 5.9 years ago by Yamol40

You might not need the number of probes, but for filtering and QC purposes, they can be invaluable, because a) probes are not evenly spaced and b) segments defined by larger numbers of probes are generally higher-confidence scores.

ADD REPLYlink written 5.9 years ago by Chris Miller21k

It's not quite that simple - if your value If your segment_mean is 0.07 (~= 2.1 copies), it's not particularly accurate to call that an amplification. The difference from 2 is usually just a result of noise. Setting reasonable thresholds for gain and loss is a hard problem, especially when you take into account things like subclonal copy number events in cancer.

ADD REPLYlink written 5.9 years ago by Chris Miller21k

Thanks so much! I really appreciate your help to my PhD candidate study!

ADD REPLYlink written 5.8 years ago by Yamol40

How did you calculate that there are ~=2.1 copies if the segment_mean is 0.07

ADD REPLYlink written 3.4 years ago by khagay0

2^0.07*2 = 2.099433

I assume that I rounded :)

(edit - I screwed up while typing in a meeting earlier. You raise 2 to the nth power then multiply by two (since the assumption is that the normal sample is diploid)

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Chris Miller21k
gravatar for arno.guille
6.0 years ago by
arno.guille400 wrote:

To my knowledge, a such convertion tool doesn't exist. However i have 3 solutions for you. The first two solutions require coding skills.

The First : Get the coordinates of all the refseq from UCSC (Tool -> Table Browser). Then match the coordinates between the refseq file and your CNV file. You can compute the degree of deletion/amp by taking the absolute median for example.

The Second : The R package "cgdsr" which provides a basic set of R functions for querying the Cancer Genomics Data Server (CGDS) and in particular TCGA data

The Third : The cBioCancer Genomics Portal provides visualization, analysis and download of large-scale cancer genomics data sets including TCGA data.

ADD COMMENTlink written 6.0 years ago by arno.guille400
gravatar for Chris Miller
6.0 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

This is a straightforward coding exercise that can be accomplished with a few lines of perl, or by using something like bedTools.

Essentially, you're going to take a file containing coordinates for every gene, and intersect it with the regions of copy number alteration. Watch out for edge cases - what happens when a gene spans two or more copy number regions?

ADD COMMENTlink written 6.0 years ago by Chris Miller21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 452 users visited in the last hour