CNVkit : small CNV calling
2
0
Entering edit mode
4.6 years ago
Hällyss ▴ 60

Hello every one,

For a few weeks, we have been using CNVkit to detect CNVs of the size of a gene in our somatic panel. Enthused by the results, we carried out our study on our constitutional panel this time. Most CNVs detected in this panel measure 1 to 3 exons. These CNVs are not seen by CNVkit. Is this normal?

Alice

cnv cnvkit single-exon • 2.7k views
2
Entering edit mode
4.5 years ago
Hällyss ▴ 60

I search in the manual and I found this sentence :

However, note that CNVkit is less accurate in detecting CNVs smaller than 1 Mbp, typically only detecting variants that span multiple exons or captured regions. When used on exome or target panel datasets, CNVkit will not detect the small CNVs that more common in populations.

But if we create a BED file with small regions (eg 25 or even 12 bp) with the -a option, it is possible to see CNVs of small sizes, up to 2 contiguous exons.

1
Entering edit mode

Hi Alice,

I am also planning to use CNVkit for my constitutional samples. Just wondering about the lower size limit of CNV that CNV kit can detect. What is the size of 2 contiguous exons that you detected? It will be helpful if you could you please provide more details on the bed file that you generated/ command line that you used to create the BED file with smaller regions.

1
Entering edit mode

Our capture is about 420 kb (only 35 gènes) . We sequence samples with a depth of coverage about 300b. Finally the 2 CNV detected (no false positives) have a size of 2411 and 98 bases. Two other CNVs of 1 exon were tested, and they could not be detected. They measure 54 and 53bp.

For the command line :

cnvkit.py target Capture.bed --split -a 12-o my_targets.bed

cnvkit.py antitarget my_targets.bed -a 15000 -g data/access-5k-mappable.hg19.bed -o my_antitargets.bed

1500 is the value that makes it possible to obtain an average of similar coverage whether in target and off-target.

data/access-5k-mappable.hg19.bed is a file in the cnvkit directory.

The baseline is created with all the samples of the run (positives are unknown). The rest of the command (coverage, fix, segmentation CBS and call threshold) has not been modified from what is proposed in the manual.

Do not hesitate if you have further questions

Alice

0
Entering edit mode

Hi Alice, thank you for sharing these informations! I'm starting with cnvkit and just wanted to know if it's working fine for you at this time and if you made any modifications!

Bruno

1
Entering edit mode
4.5 years ago
Eric T. ★ 2.7k

Yes, CNVkit and other segmentation-based copy number callers struggle to accurately detect CNVs in constitutional samples. Using default settings, a single-exon CNV won't show up with the segmenters currently available (though you could in theory use the 'spread' and 'log2' columns in a pooled reference as the basis for a Z-test of each exon in your capture -- this is not yet supported directly).

If your sequencing data are high quality then you can subdivide the targets and antitargets more finely, as your other comment mentions, though this can result in more noise as well. Then if you've managed to increase the sensitivity of CNVkit on your data and are now seeing poor specificity, you can reduce false positives with the segmetrics --ci and call --filter ci commands.

0
Entering edit mode

I would like to try the "in theory" method you mentioned to attempt to get the cnv for single exon. I understand how the 'log2' column can be used. But how the 'spread' can be used for the Z-test, can you share your ideas?

0
Entering edit mode

In my experiences, we can detect cnv for single exon for constitutional samples. We use capture, amplicon doesn't works well. The target is cut bin of 20 bases. The antitarget is cut depending of the depth and the on-target. (We have 100X and 70% on-target so we cut around 20000 bases) And then we run cnvkit like in the manual.

If you have question, do not hesitate !

Alice

0
Entering edit mode

How did you reach that 20Kb value? would you mind sharing how you calculate that?

0
Entering edit mode

The autobin command will do this calculation for you, or you can see the source code (cnvlib.autobin) and documentation for that command to see how it's done.

0
Entering edit mode

You can use the spread value as an estimate of variance, so the square root of that is your standard deviation parameter, and log2 is the mean.