I'm trying to understand TCGA's Level 3 copy number data. Specifically, I found two tables that appear to be made via ASCAT, and I want to know what the column names mean and how the data has been processed. I've read the GDC copy number pipeline documentation (https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/CNV_Pipeline/), but it doesn't mention these tables or ASCAT.
Columns: gene ID, gene name, chromosome, start, end, copy_number, min_copy_number, max_copy_number.
I'd like to use it, but I need to know:
- Are these absolute copy number calls? (not comparisons to the germline or something)
- What do 'min_copy_number' and 'max_copy_number' mean?
- What do NAs in the copy number columns mean? There are about 500 of them in this particular BRCA table (~1%)
I found another table there which has ASCAT allele-specific segment copy number calls and looks like this (image here):
Columns: GDC_Aliquot, chromosome, start, end, copy_number, major_copy_number, minor_copy_number
I thought the gene-level copy number calls might come from intersecting this table with gene locations, but some (~0.3%) of the gene-level calls don't match the ones in this table. I thought the major allele copy number and minor allele copy number here might relate to min copy number and max copy number in the gene-level table, but they often don't match and are probably different things given they don't add up to the total copy number (they're usually just equal).
In general, I would like to know what processing steps were taken to arrive at these tables. (I'm honestly just guessing the gene-level copy numbers come from ASCAT because they show up when you tick 'ASCAT2' in TCGA filters...) For example:
- are they filtered to remove segments that are copy-number altered in the patient's germline or frequent CNVs in the population? Since I think ASCAT assumes the matched normal is diploid
- did they use Circular Binary Segmentation or ASCAT's own ASPCF method?
- did they do GC correction?
- how do these ASCAT tables relate to the other copy number tables available in that section of TCGA, e.g. the copy number segment table? Are they derived from there or computed independently from CEL files/SNP6 arrays?
- how do the two ASCAT tables relate to each other?
Is there documentation anywhere that explains in detail what processing steps were done? Or the code for each step of the pipeline?
Thanks a million for any help.