Question: I got a problem with making normal reference using cnvkit
gravatar for prosium
11 months ago by
prosium0 wrote:


I am trying to use cnvkit with paired-WGS data and got an error with making normal reference.

1) $ access hg19.fa -o access.hg19.bed

2) $ autobin GroupA_Normal.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt

$ autobin GroupA_Tumor.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt

$ autobin GroupB_Normal.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt

$ autobin GroupB_Tumor.bam -m wgs -g access.hg19.bed --annotate refFlat_hg19.txt

3) $ coverage -p 5 GroupA_Normal.bam -o GroupA_Normal.targetcoverage.cnn

$ coverage -p 5 GroupA_Tumor.bam -o GroupA_Tumor.targetcoverage.cnn

$ coverage -p 5 GroupB_Normal.bam -o GroupB_Normal.targetcoverage.cnn

$ coverage -p 5 GroupB_Tumor.bam -o GroupB_Tumor.targetcoverage.cnn

+Also we did 'antitarget' as the same manner.

4) Error occurred at this step. reference Group*Normal.{,anti}targetcoverage.cnn --fasta hg19.fa -o my_reference.cnn

..... Correcting for GC bias...

Correcting for density bias...

Loading target GroupB_Normal.targetcoverage.cnn

Traceback (most recent call last): File "/root/anaconda3/bin/", line 13, in <module> args.func(args) File "/root/anaconda3/lib/python3.6/site-packages/cnvlib/", line 518, in _cmd_reference args.do_rmask) File "/root/anaconda3/lib/python3.6/site-packages/cnvlib/", line 55, in do_reference do_gc, do_edge, False) File "/root/anaconda3/lib/python3.6/site-packages/cnvlib/", line 237, in combine_probes % (fname, filenames[0]))

RuntimeError: GroupB_Normal.targetcoverage.cnn bins do not match those in GroupA_Normal.targetcoverage.cnn

Please let me know if you know of any advice or suggestions.

Thank you in advance.

ADD COMMENTlink modified 11 months ago by Eric T.2.6k • written 11 months ago by prosium0
gravatar for Eric T.
11 months ago by
Eric T.2.6k
San Francisco, CA
Eric T.2.6k wrote:

The autobin command is meant to be run just once per cohort, using a representative sample (e.g. GroupA_Normal). It looks at the on- and off-target coverage depths of that sample to choose reasonable bin sizes, and then generates corresponding bin coordinates as BED files. For processing multiple samples together, you need to use the same set of BED files each time, otherwise the bin coordinates won't line up across samples.

ADD COMMENTlink written 11 months ago by Eric T.2.6k

For the case-control sample, it is right to store all the samples in specific folder and run 'batch' according to the manual on the homepage.

However, I can not afford to have a sufficient HDD usage, so first I use 'access' function as a ref genome and calculate coverage by using only normal samples.

It looks and works well but is it alright without autobin step? Thank you in advance.

  • 1) access hg38.fa -o access.hg38.bed

  • 2-1) coverage -p 4 GroupA_Normal.bam access.hg38.bed -o GroupA_Normal.targetcoverage.cnn

  • 2-2) coverage -p 4 GroupB_Normal.bam access.hg38.bed -o GroupB_Normal.targetcoverage.cnn

  • 2-3 and 2-4) Calculate coverage of tumor samples using 'access.hg38.bed'

  • 3) reference Group*Normal.targetcoverage.cnn -f hg38.fa -o pooled-normal_reference.cnn

  • 4) fix GroupA_Tumor.targetcoverage.cnn pooled-normal_reference.cnn -o GroupA_Tumor.cnr

  • 5) segment -m hmm GroupA_Tumor.cnr -o GroupA_Tumor.cns

  • 'Likewise GroupB_tumor'

ADD REPLYlink modified 11 months ago • written 11 months ago by prosium0

I suggest adding the target command with a bin size of 5000 after step 1, then use the resulting BED file as input to the coverage command. Otherwise the bins will be huge.

ADD REPLYlink written 11 months ago by Eric T.2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1750 users visited in the last hour