Question: Calling CNA from aCGH: Thresholding?
gravatar for Noushin N
3.8 years ago by
Noushin N560
Baltimore, MD
Noushin N560 wrote:

Hi everybody,

I am working on array CGH data from a set of tumor samples with potentially high normal contamination. I have performed the necessary pre-processing (background subtraction, within array normalization) on the tumor data, and run the circular binary segmentation on the resulting log ratio values.

I have seen that people often threshold the mean segment values following this step to call gains and losses (a typical value I have seen in literature is a minimum absolute value of 0.3 for log2ratio). However, this process assumes a minimum value of tumor purity, which may not be true for a subset of my samples.

I was wondering if anyone can suggest a more systematic way of approaching this problem in presence of high level (50-80%) of normal tissue contamination in the tumor sample.

Thank you!



array cgh copy number • 1.1k views
ADD COMMENTlink modified 3.8 years ago by Eric T.2.5k • written 3.8 years ago by Noushin N560

Have you estimated the tumor purity of each of your samples already, or is that the next step you're looking to do?

ADD REPLYlink written 3.8 years ago by Eric T.2.5k

I do have some (possibly not highly accurate) estimates from somatic mutation alllele frequency data, and also the pathology report (more rough).


ADD REPLYlink written 3.8 years ago by Noushin N560
gravatar for Eric T.
3.8 years ago by
Eric T.2.5k
San Francisco, CA
Eric T.2.5k wrote:

In absence of purity estimates, thresholding is pretty much your only option. If your data is noisy then you might opt to focus on high-level amplifications and homozygous deletions in your report. Given another source of data like SNV calls, you can also use the allele frequencies detect loss of heterozygosity, which can support copy number calls for hemizygous losses at least.

Given purity estimates, you can rescale the log2 values with math, and then use thresholds more confidently.

For example, if your segmented aCGH calls are in SEG format (the output of DNAcopy), then you can use CNVkit to adjust the log2 values:

  1. Load a sample into CNVkit's own format: import-seg Sample.seg -o Sample.cns
  2. Rescale the log2 values using a purity estimate and optionally the ploidy: rescale Sample.cns --purity 0.45 --ploidy 2 -o Sample-rescaled.cns
  3. Optionally, perform the thresholding by assuming 100% purity (now valid) and rounding log2 values to the nearest absolute integer copy number: call -m clonal Sample-rescaled.cns -o Sample-called.cns
    Or by using hard cutoffs (in log2 scale): call -m threshold -t=-1.1,-0.4,0.3,0.7 Sample-rescaled.cns -o Sample-called.cns
  4. Export the adjusted segments back to SEG: export seg Sample-called.cns -o Sample-called.seg
    Or to another format: export bed Sample-called.cns -o Sample-called.bed export vcf Sample-called.cns -o Sample-called.vcf
ADD COMMENTlink written 3.8 years ago by Eric T.2.5k

Thanks so much Etal!

ADD REPLYlink written 3.8 years ago by Noushin N560
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 940 users visited in the last hour