Question: TCGA CNV data reformatting
3 months ago by
vctrm6720 wrote:

I have TCGA data that needs to be reformatted according to the following:

# 'x' is a matrix of segmented output from ASCAT, with at least the
#   following columns (column names are not important):
# 1: sample id
# 2: chromosome (numeric)
# 3: segment start
# 4: segment end
# 5: number of probes
# 6: total copy number
# 7: nA
# 8: nB
# 9: ploidy
# 10: contamination, aberrant cell fraction

However, I'm not sure how to do this. For one, I'm assuming nA and nB refer to allele-specific copy numbers. But TCGA only has the following data:

A segmentation file:

Sample  Chromosome      Start   End     Num_Probes      Segment_Mean
DRAMA_p_TCGA_276_278_N_GenomeWideSNP_6_A04_1322446      1       61735   98602   17      0.3913
DRAMA_p_TCGA_276_278_N_GenomeWideSNP_6_A04_1322446      1       228706  603590  16      -0.2696

A raw copynumber data file:

Composite Element REF   Signal
CN_473963       2.87
CN_473964       2.044

A allele-specific copynumber file:

Composite Element REF   Signal_A        Signal_B
SNP_A-8575125   1.865   0.026
SNP_A-8497791   1.843   -0.426

I'm not sure how to reformat these files into what is needed. Specifically, I don't see how I can get the allele-specific copy numbers (nA and nB) for each segment.

Does anyone have any suggestions?

cnv • 212 views
ADD COMMENTlink modified 3 months ago by markus.riester500 • written 3 months ago by vctrm6720

Yes, you can safely add them. In short segments you sometimes have not enough heterozygous SNPs to resolve major and minor, but total should be still reliable. I guess that’s the reason there is a separate column for total in your template.

ADD REPLYlink written 3 months ago by markus.riester500

Do you know how I might get the number of probes in each segment? I do not see an ASCAT output for that.

ADD REPLYlink written 3 months ago by vctrm6720
3 months ago by
markus.riester500 wrote:

You will need the ABSOLUTE or ASCAT output. There are TCGA pan cancer papers providing the former for at least most of the datasets. ABSOLUTE will get you the missing columns 6 to 10.

ADD COMMENTlink written 3 months ago by markus.riester500

Thank you. I ran ASCAT and in the segmented output file, there are two columns named "nMajor nMinor". Could I just add these to get the total copy number? I am hesitant because perhaps I will get total CNV counts in segment regions that are not the same as the ones identified by ASCAT?

ADD REPLYlink written 3 months ago by vctrm6720
