Understanding gene level copy number data from TCGAbiolinks
Entering edit mode
16 months ago
billyK • 0

Hi all. Thanks in advance for helping me out.

I'm trying to analyze copy number data from TCGA (using TCGAbiolinks), and trying to define genes that are either amplified or deleted.

To download gene level copy number alteration, I used the code below:

query <- GDCquery(project = 'TCGA-BRCA', data.category = 'Copy Number Variation', data.type = 'Gene Level Copy Number', sample.type = 'Primary Tumor')

I have three questions related to the downloaded data.

First, I'm curious to know the pipeline used to calculate gene level copy numbers.

Seconly, I've noticed that some patients have gene level copy numbers that are unexpectedly huge. For example, 'TCGA-A8-A093-10A-01D-A012-01' had a copy number of 26 in a gene "ENSG00000085733.16". I'm curious to know if this is usual.

Finally, what would be a cutoff score for gene level copy number to define whether a gene is amplified or deleted?

Thank you so much for your help.

CNV TCGA • 625 views
Entering edit mode

Were you able to get answer for your questions?

Entering edit mode

This is a GDC question, not a TCGAbiolinks question. For TCGA, GDC uses ASCAT2 (SNP6) and ASCATNGS (WGS) for integer value copy number. And gene level copy number is just intersect gene region with segmentation file, with some handling of edge cases. I am pretty sure these have been clearly described in the GDC documentation.


Login before adding your answer.

Traffic: 1950 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6