Question

I got a problem with making normal reference using cnvkit

1

Entering edit mode

5.0 years ago

prosium ▴ 20

Hello,

I am trying to use cnvkit with paired-WGS data and got an error with making normal reference.

1) $ cnvkit.py access hg19.fa -o access.hg19.bed

2) $ cnvkit.py autobin GroupA_Normal.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt

$ cnvkit.py autobin GroupA_Tumor.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt

$ cnvkit.py autobin GroupB_Normal.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt

$ cnvkit.py autobin GroupB_Tumor.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt

3) $ cnvkit.py coverage -p 5 GroupA_Normal.bam GroupA_Normal.target.bed -o GroupA_Normal.targetcoverage.cnn

$ cnvkit.py coverage -p 5 GroupA_Tumor.bam GroupA_Tumor.target.bed -o GroupA_Tumor.targetcoverage.cnn

$ cnvkit.py coverage -p 5 GroupB_Normal.bam GroupB_Normal.target.bed -o GroupB_Normal.targetcoverage.cnn

$ cnvkit.py coverage -p 5 GroupB_Tumor.bam GroupB_Tumor.target.bed -o GroupB_Tumor.targetcoverage.cnn

+Also we did 'antitarget' as the same manner.

4) Error occurred at this step. cnvkit.py reference Group*Normal.{,anti}targetcoverage.cnn --fasta hg19.fa -o my_reference.cnn

..... Correcting for GC bias...

Correcting for density bias...

Loading target GroupB_Normal.targetcoverage.cnn

Traceback (most recent call last): File "/root/anaconda3/bin/cnvkit.py", line 13, in <module> args.func(args) File "/root/anaconda3/lib/python3.6/site-packages/cnvlib/commands.py", line 518, in _cmd_reference args.do_rmask) File "/root/anaconda3/lib/python3.6/site-packages/cnvlib/reference.py", line 55, in do_reference do_gc, do_edge, False) File "/root/anaconda3/lib/python3.6/site-packages/cnvlib/reference.py", line 237, in combine_probes % (fname, filenames[0]))

RuntimeError: GroupB_Normal.targetcoverage.cnn bins do not match those in GroupA_Normal.targetcoverage.cnn

Please let me know if you know of any advice or suggestions.

Thank you in advance.

cnvkit normal-reference paired-WGS • 2.6k views

ADD COMMENT • link updated 5.0 years ago by Eric T. ★ 2.8k • written 5.0 years ago by prosium ▴ 20

score 0 · Answer 1 · 2019-04-15

0

Entering edit mode

5.0 years ago

Eric T. ★ 2.8k

The autobin command is meant to be run just once per cohort, using a representative sample (e.g. GroupA_Normal). It looks at the on- and off-target coverage depths of that sample to choose reasonable bin sizes, and then generates corresponding bin coordinates as BED files. For processing multiple samples together, you need to use the same set of BED files each time, otherwise the bin coordinates won't line up across samples.

ADD COMMENT • link 5.0 years ago by Eric T. ★ 2.8k

0

Entering edit mode

For the case-control sample, it is right to store all the samples in specific folder and run 'batch' according to the manual on the homepage.

However, I can not afford to have a sufficient HDD usage, so first I use 'access' function as a ref genome and calculate coverage by using only normal samples.

It looks and works well but is it alright without autobin step? Thank you in advance.

1) cnvkit.py access hg38.fa -o access.hg38.bed
2-1) cnvkit.py coverage -p 4 GroupA_Normal.bam access.hg38.bed -o GroupA_Normal.targetcoverage.cnn
2-2) cnvkit.py coverage -p 4 GroupB_Normal.bam access.hg38.bed -o GroupB_Normal.targetcoverage.cnn
2-3 and 2-4) Calculate coverage of tumor samples using 'access.hg38.bed'
3) cnvkit.py reference Group*Normal.targetcoverage.cnn -f hg38.fa -o pooled-normal_reference.cnn
4) cnvkit.py fix GroupA_Tumor.targetcoverage.cnn pooled-normal_reference.cnn -o GroupA_Tumor.cnr cnvkit.py
5) cnvkit.py segment -m hmm GroupA_Tumor.cnr -o GroupA_Tumor.cns
'Likewise GroupB_tumor'

ADD REPLY • link 5.0 years ago by prosium ▴ 20

0

Entering edit mode

I suggest adding the target command with a bin size of 5000 after step 1, then use the resulting BED file as input to the coverage command. Otherwise the bins will be huge.

ADD REPLY • link 5.0 years ago by Eric T. ★ 2.8k

0

Entering edit mode

In the batch command does autobin only run on median file of tumor samples or median file of all tumor and normal samples?