Question: TCGA/Broad Institute CNV Files Segment Annotation
gravatar for Xinsen Xu
4.1 years ago by
Xinsen Xu30
Xinsen Xu30 wrote:

Hi guys, I try to use the TCGA copy number segment files to analyze the specific gene copy number change.

I've downloaded the data, and it looked like this 

Sample    Chromosome    Start    End    Num_Probes    Segment_Mean
TCGA-3M-AB46-01A-11D-A40Z-01    1    3218610    9104465    3536    0.3108
TCGA-3M-AB46-01A-11D-A40Z-01    1    9112661    18443992    4501    0.7259
TCGA-3M-AB46-01A-11D-A40Z-01    1    18444009    24463116    3487    0.2912
TCGA-3M-AB46-01A-11D-A40Z-01    1    24464705    32041750    3578    -0.2615



I don't know how to annotate this data, such as which rows represent the copy number data of the p53, pten, kras genes, etc. Could anyone help me with this? Thanks so much!



snp • 3.9k views
ADD COMMENTlink modified 4.0 years ago by Na Sed280 • written 4.1 years ago by Xinsen Xu30
gravatar for Eric T.
4.0 years ago by
Eric T.2.5k
San Francisco, CA
Eric T.2.5k wrote:

Most of the segments in your file (this is SEG format) cover multiple genes, so adding the gene names directly to your SEG file could get messy. What to do depends what your next goal is.

You can load this SEG file in IGV and the segment mean values for each sample will be displayed as a heatmap. Then you can view the segment data at any resolution, search for genes by name, etc. This might be enough.

If you want to tabulate the data differently, e.g. create a table of altered genes vs. the samples they occur in, or number of samples, then you can download a table of gene annotations from the UCSC Genome Browser and use some combination of general-purpose tools for manipulating tabular data (bedtools, R, Python pandas, awk, ...) to extract the information you want. In R, consider using the GenomicRanges library to fetch and apply gene annotations directly without manually downloading the BED file from UCSC. Note that a lot of this aggregation has been done by others already, so also check cBioPortal and similar resources to save yourself time.

ADD COMMENTlink written 4.0 years ago by Eric T.2.5k
gravatar for Na Sed
4.0 years ago by
Na Sed280
United States
Na Sed280 wrote:

You have to map the regions to genes. Please see here, you find a complete procedure to do that.

ADD COMMENTlink written 4.0 years ago by Na Sed280
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 970 users visited in the last hour