How to analyse number of breakpoints -TCGA data
1
0
Entering edit mode
4.5 years ago
berry ▴ 40

Hi,

I have TCGA segment files and I want to analyse the number of breakpoints. Does anyone know how to calculate it?

Thanks!

breakpoints TCGA • 1.1k views
ADD COMMENT
0
Entering edit mode
4.5 years ago

Please help us by stating the exact source of the data that you have obtained. The TCGA data is 'sprayed' all over the World Wide Web at this point, and there are undoubtedly even third and fourth level TCGA data sources, at this stage.

  1. Primary source: Genomic Data Commons
  2. Secondary source: cBioPortal, TCGAbiolinks, Xena, etc
  3. Third source: groups that take secondary source data and re-process it further
ADD COMMENT
0
Entering edit mode

Hi Kevin,

You are right. I downloaded the segment files using TCGAbiolinks. And the files look like this :

Sample Chromosome Start End Num_Probes Segment_Mean GLAZY_p_TCGA_B20_SNP_N_GenomeWideSNP_6_H03_517846 1 3218610 8355467 3166 -0.0355 GLAZY_p_TCGA_B20_SNP_N_GenomeWideSNP_6_H03_517846 1 8372558 9407175 484 -0.9326 GLAZY_p_TCGA_B20_SNP_N_GenomeWideSNP_6_H03_517846 1 9408959 21324564 6343 -0.0398

ADD REPLY
0
Entering edit mode

I see, so, this seems to be the circular binary segmented copy number data. You now have to define what you mean by 'breakpoint' and then search for these in the data. A 'breakpoint', I find, means something different to different people' however, generally it can be regarded as a point of deletion or some other structural 'anomaly', such as inversion or translocation.

ADD REPLY
0
Entering edit mode

Hi Kevin, what I want to do is to count the number of genome breaks occurring across the genome, in each sample. The breaks where the copy number changes (amplified/deleted). I want to see if there is a difference or a trend in breakpoint counts within the samples I grouped together (ie. which samples have more copy number load).

ADD REPLY
0
Entering edit mode

Okay, it should be relatively easy to do, in that case. You should review how these files were produced by taking a look here: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/CNV_Pipeline/

Keep in mind that they may have undergone extra data processing steps by TCGAbiolinks - check this with the TCGAbiolinks manuscript and / or online documentation.

Then, you may understand the threshold that you need to use to define a breakpoint deletion. Detecting translocations and inversions from this data will be next to impossible - you would require the original BAM files at least.

ADD REPLY

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6