Question

Calling CNA from aCGH: Thresholding?

1

Entering edit mode

9.4 years ago

Noushin N ▴ 620

Hi everybody,

I am working on array CGH data from a set of tumor samples with potentially high normal contamination. I have performed the necessary pre-processing (background subtraction, within array normalization) on the tumor data, and run the circular binary segmentation on the resulting log ratio values.

I have seen that people often threshold the mean segment values following this step to call gains and losses (a typical value I have seen in literature is a minimum absolute value of 0.3 for log2ratio). However, this process assumes a minimum value of tumor purity, which may not be true for a subset of my samples.

I was wondering if anyone can suggest a more systematic way of approaching this problem in presence of high level (50-80%) of normal tissue contamination in the tumor sample.

Thank you!

array-cgh copy-number • 2.6k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.4 years ago by Noushin N ▴ 620

1

Entering edit mode

Have you estimated the tumor purity of each of your samples already, or is that the next step you're looking to do?

ADD REPLY • link 9.4 years ago by Eric T. ★ 2.9k

0

Entering edit mode

I do have some (possibly not highly accurate) estimates from somatic mutation alllele frequency data, and also the pathology report (more rough).

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.4 years ago by Noushin N ▴ 620

Ram · Accepted Answer · 2016-02-04

In absence of purity estimates, thresholding is pretty much your only option. If your data is noisy then you might opt to focus on high-level amplifications and homozygous deletions in your report. Given another source of data like SNV calls, you can also use the allele frequencies detect loss of heterozygosity, which can support copy number calls for hemizygous losses at least.

Given purity estimates, you can rescale the log2 values with math, and then use thresholds more confidently.

For example, if your segmented aCGH calls are in SEG format (the output of DNAcopy), then you can use CNVkit to adjust the log2 values:

Load a sample into CNVkit's own format:

cnvkit.py import-seg Sample.seg -o Sample.cns

Rescale the log2 values using a purity estimate and optionally the ploidy:

cnvkit.py rescale Sample.cns --purity 0.45 --ploidy 2 -o Sample-rescaled.cns

Optionally, perform the thresholding by assuming 100% purity (now valid) and rounding log2 values to the nearest absolute integer copy number:
```
cnvkit.py call -m clonal Sample-rescaled.cns -o Sample-called.cns
```
Or by using hard cutoffs (in log2 scale):
```
cnvkit.py call -m threshold -t=-1.1,-0.4,0.3,0.7 Sample-rescaled.cns -o Sample-called.cns
```

Export the adjusted segments back to SEG:

cnvkit.py export seg Sample-called.cns -o Sample-called.seg

Or to another format:

cnvkit.py export bed Sample-called.cns -o Sample-called.bed
cnvkit.py export vcf Sample-called.cns -o Sample-called.vcf