Question: CNVkit : small CNV calling
gravatar for alice.choury
10 months ago by
alice.choury50 wrote:

Hello every one,

For a few weeks, we have been using CNVkit to detect CNVs of the size of a gene in our somatic panel. Enthused by the results, we carried out our study on our constitutional panel this time. Most CNVs detected in this panel measure 1 to 3 exons. These CNVs are not seen by CNVkit. Is this normal?

Thank you for your answer,


cnv single-exon cnvkit • 656 views
ADD COMMENTlink modified 9 months ago by Eric T.1.8k • written 10 months ago by alice.choury50
gravatar for alice.choury
9 months ago by
alice.choury50 wrote:

I search in the manual and I found this sentence :

However, note that CNVkit is less accurate in detecting CNVs smaller than 1 Mbp, typically only detecting variants that span multiple exons or captured regions. When used on exome or target panel datasets, CNVkit will not detect the small CNVs that more common in populations.

But if we create a BED file with small regions (eg 25 or even 12 bp) with the -a option, it is possible to see CNVs of small sizes, up to 2 contiguous exons.

ADD COMMENTlink modified 9 months ago • written 9 months ago by alice.choury50

Hi Alice,

I am also planning to use CNVkit for my constitutional samples. Just wondering about the lower size limit of CNV that CNV kit can detect. What is the size of 2 contiguous exons that you detected? It will be helpful if you could you please provide more details on the bed file that you generated/ command line that you used to create the BED file with smaller regions.

Thank you in advance

ADD REPLYlink written 9 months ago by jainythomas110

Our capture is about 420 kb (only 35 gènes) . We sequence samples with a depth of coverage about 300b. Finally the 2 CNV detected (no false positives) have a size of 2411 and 98 bases. Two other CNVs of 1 exon were tested, and they could not be detected. They measure 54 and 53bp.

For the command line : target Capture.bed --split -a 12-o my_targets.bed antitarget my_targets.bed -a 15000 -g data/access-5k-mappable.hg19.bed -o my_antitargets.bed

1500 is the value that makes it possible to obtain an average of similar coverage whether in target and off-target.

data/access-5k-mappable.hg19.bed is a file in the cnvkit directory.

The baseline is created with all the samples of the run (positives are unknown). The rest of the command (coverage, fix, segmentation CBS and call threshold) has not been modified from what is proposed in the manual.

Do not hesitate if you have further questions


ADD REPLYlink modified 9 months ago • written 9 months ago by alice.choury50
gravatar for Eric T.
9 months ago by
Eric T.1.8k
San Francisco, CA
Eric T.1.8k wrote:

Yes, CNVkit and other segmentation-based copy number callers struggle to accurately detect CNVs in constitutional samples. Using default settings, a single-exon CNV won't show up with the segmenters currently available (though you could in theory use the 'spread' and 'log2' columns in a pooled reference as the basis for a Z-test of each exon in your capture -- this is not yet supported directly).

If your sequencing data are high quality then you can subdivide the targets and antitargets more finely, as your other comment mentions, though this can result in more noise as well. Then if you've managed to increase the sensitivity of CNVkit on your data and are now seeing poor specificity, you can reduce false positives with the segmetrics --ci and call --filter ci commands.

ADD COMMENTlink written 9 months ago by Eric T.1.8k

I would like to try the "in theory" method you mentioned to attempt to get the cnv for single exon. I understand how the 'log2' column can be used. But how the 'spread' can be used for the Z-test, can you share your ideas?

ADD REPLYlink written 10 weeks ago by qyang20

In my experiences, we can detect cnv for single exon for constitutional samples. We use capture, amplicon doesn't works well. The target is cut bin of 20 bases. The antitarget is cut depending of the depth and the on-target. (We have 100X and 70% on-target so we cut around 20000 bases) And then we run cnvkit like in the manual.

If you have question, do not hesitate !


ADD REPLYlink written 9 weeks ago by alice.choury50

You can use the spread value as an estimate of variance, so the square root of that is your standard deviation parameter, and log2 is the mean.

ADD REPLYlink written 9 weeks ago by Eric T.1.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1198 users visited in the last hour