Question

Explications about call function of CNVkit

1

Entering edit mode

7.1 years ago

Hällyss ▴ 90

I try to explain in the most just way possible the different modes of the call function.

Please tell me if I am wrong and complete the information if necessary.

The call function allows the estimation of the absolute number of copies for each segment. There are two modes of calculation:

the threshold mode which establishes a threshold scale for categorizing log2. By specifying 4 thresholds, we obtain 5 categories: OX, 1X, 2X 3X and 4X +
mode that takes ploidy into account (clonal) It estimates a relationship between the ploidy of each chromosome and the log2 of the segments. By default the ploidy is 2X for autosomes.

I still have questions:

How is the basic value of the clonal mode set? Based on the log2 average of the chromosome? Based on the log2 average of the entire sample? By another way (which)?
How is it decided afterwards that the sample contains a deletion or a duplication? Are these thresholds? A proportionality calculation (and a rounding)?
The clonal mode is rather used for pure somatic samples or on the contrary with 20-80% of tumor cells or even less?
Threshold mode is used instead for pure somatic samples or on the contrary with 20-80% tumor cells or even less?
the fact of rescalling the sample makes it possible to use one mode rather than another? What is the limit? (Sometimes we have samples with 5% of tumor cells and I have the impression that rescalling creates more noise than real call.)

Thank you for your tool, your community and the responsiveness of your answers (which are very rewarding). For all of these reasons we decided to use your tool for the detection of CNV routinely in our laboratory.

cordially

Alice

cnvkit call • 2.2k views

ADD COMMENT • link updated 7.1 years ago by Eric T. ★ 2.8k • written 7.1 years ago by Hällyss ▴ 90

score 1 · Answer 1 · 2017-03-20

The segment log2 values are already "centered" so that a log2 value of 0.0 corresponds to what is believed to be neutral copy number, i.e. 2x for autosomes. In the .cns file output from the segment command, the log2 value is the weighted mean of the .cnr bin log2 values within each segment, where weights are the weight column of the .cnr file. The call --center option allows you to re-center the segment log2 values with a different function for finding the "center" value, in case you are suspicious that the weighted mean provided a good center point -- you'll notice this in the scatter plots, usually. (I'm not sure I fully understood this question.)
This is based on thresholds, either fixed as in the default -m threshold method or based on rounding to the nearest integer copy number with the -m clonal method. Details here.
The clonal method is used if you know the purity and ploidy, especially in non-human and/or non-diploid genomes. It's probably better to use this with germline samples.
The threshold method provides reasonable heuristic cutoffs that work well for human cancer samples. It's best if you have an estimate of the tumor purity, but if you don't, the calls should be OK down to about 50% purity. If you have no idea what the purity is, you can try shrinking the thresholds, specifying a new set of cutoffs closer to 0.0 for single-copy loss and gain. (The default thresholds already do this a little, relative to clonal -- it assumes you somewhat prefer sensitivity over specificity.)
With 5% purity, you should do some additional work to reduce noise in the calls. Also use a tool to estimate tumor purity in your pipeline, e.g. BubbleTree, Sequenza, or PyClone. I recommend using segmetrics --ci to estimate confidence intervals, then call --filter ci to drop the calls with weak support, in addition to using --purity to rescale. You should probably benchmark both threshold and clonal to see which works best for your samples.