Question

CNV not called

1

Entering edit mode

8.6 years ago

vps767 ▴ 20

I am attempting to use CNVkit, and have successfully run it on a test exome sequencing sample (non-cancer blood) vs a panel of other samples (non-cancer blood). The test sample contains a very large homozygous deletion that should be trivial to detect. The deletion is not called using cbs (default parameters or threshold 0.2) or flasso -- with warning like

DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False. data = self._reader.read(nrows)

Haarseg detects the deletion, but calls 1255 segments and the output has lost gene names.

The .cnr file clearly shows the deletion, so the early steps of processing are good. cbs run manually is able to easily detect the deletion using default parameters either with or without weights, see below

cnvkit.py batch ../shortcuts/C1-25.bam -r /mnt/capture/cnvkit/Sept30_2015_reference.cnn  --output-dir results/
cnvkit.py segment C1-25.cnr -m cbs

library("DNAcopy")
datatab <- read.table("C1-25.cnr", header=T, comment.char="")
CNA.object <- CNA(cbind(datatab$log2),datatab$chromosome,datatab$start,data.type="logratio")
segment.CNA.object <- segment(CNA.object, verbose=1, weights=datatab$weight)

Any help would be greatly appreciated.

Vince

exome-copy-number cnvkit • 3.2k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by vps767 ▴ 20

0

Entering edit mode

Would you mind tagging this question with "cnvkit" so it's easier to find? I missed it earlier, sorry.

ADD REPLY • link 8.6 years ago by Eric T. ★ 2.8k

Ram · Answer 1 · 2015-10-05

Thanks for the bug report. These two issues should be resolved now in the code on Github, and will be in the next CNVkit release:

There was a filter in place to remove very-low-coverage probes before segmentation; this makes sense for contaminated tumor samples but not for germline samples. I've made the filter optional.
For the HaarSeg issue, the gene names should show up now like they did for CBS and Fused Lasso.

Alternative workarounds (for posterity):

In absence of a .cns file for your sample, you can use CNVkit's gainloss command to identify the genes likely affected by CNVs.
If you've successfully segmented the .cnr file in raw R, if you print the output dataframe it's probably in SEG format or close to it, in which case you can import it back to CNVkit's .cns format with the import-seg command.