CNVKIT - unable to produce scatter and diagram pdfs
1
0
Entering edit mode
4.8 years ago
ww22runner ▴ 60

Hello everyone,

I have a couple of tumor-germline paired samples and I tried running cnvkit this way:

cnvkit.py batch *Tumor.bam --normal *Normal.bam \
    --targets my_baits.bed  \
    --fasta hg19.fasta --access data/access-5kb-mappable.hg19.bed \
    --output-reference my_reference.cnn --output-dir results/ \
    --diagram --scatter

When I do this, I get the following output and the analysis stops with the creation of a cnr file. I do not see a cns file/the scatter and diagram pdfs being produced.

> Correcting for GC bias... Correcting for RepeatMasker bias...
> Loading sampleA_Normal.antitargetcoverage.cnn
> Correcting for GC bias... Correcting for RepeatMasker bias... 
> Loading sampleA_Normal.antitargetcoverage.cnn
> Correcting for GC bias... Correcting for RepeatMasker bias...
> ....local/lib/python3.6/site-packages/pandas/core/frame.py:7116:
> FutureWarning: Sorting because non-concatenation axis is not aligned.
> A future version of pandas will change to not sort by default.
> To accept the future behavior, pass 'sort=False'.
> To retain the current behavior and silence the warning, pass
> 'sort=True'.
> 
>   sort=sort, Calculating average bin coverages Calculating bin spreads
> Targets: 1162 (13.7499%) bins failed filters (log2 < -5.0, log2 > 5.0,> spread > 1.0) 
> Antitargets: 19439 (100.0000%) bins failed filters 
> Wrote my_reference.cnn  with 27890 regions 
> Running 1 samples in 8 processes (that's 8  processes per bam) 
> Running the CNVkit pipeline on SampleB_Tumor.bam
> ... Processing reads in SampleB_Tumor.bam 
> Time: 76.712 seconds > (63585 reads/sec, 110 bins/sec) 
> Summary: #bins=8451, #reads=4877720,  mean=577.1767, min=0.0, max=3535.910891089109 
> Percent reads in regions: 92.246 (of 5287755 mapped)
>  Wrote SampleB_Tumor.targetcoverage.cnn  with 8451 regions 
> Processing reads in SampleB_Tumor.bam 
> Time: 2.223  seconds (0 reads/sec, 8746 bins/sec) 
> Summary: #bins=19439, #reads=0,  mean=0.0000, min=0.0, max=0.0 
> Percent reads in regions: 0.000 (of 5287755 mapped) 
> Wrote  SampleB_Tumor.antitargetcoverage.cnn  with 19439 regions 
> Processing target: SampleB_Tumor
> Keeping 7289 of 8451 bins Correcting for GC bias... 
> Correcting for density bias...
> Processing antitarget: SampleB_Tumor
> Keeping 0 of 19439 bins
> WARNING: most bins have no or very low coverage; check that the right  BED file was used 
> Wrote SampleB_Tumor.cnr with 7289 regions 
> Segmenting SampleB_Tumor.cnr
> ... Segmenting with method 'cbs', significance threshold 0.0001, in 8  processes
> .../cnvkit/cnvlib/smoothing.py:197:
> RuntimeWarning: invalid value encountered in greater   bad_idx = (y >
> x.max()) | (y < x.min())
> .../cnvkit/cnvlib/smoothing.py:197:
> RuntimeWarning: invalid value encountered in less   bad_idx = (y >
> x.max()) | (y < x.min())

I see the RuntimeWarnings repeat several times and the command ends with no other output being produced. Has anyone else experienced this?

Any input is appreciated! My cnvkitversion is 0.9.6.dev0 and python version is 3.6.1 . Thank you.

cnvkit • 2.5k views
ADD COMMENT
1
Entering edit mode

Keeping 0 of 19439 bins WARNING: most bins have no or very low coverage; check that the right BED file was used

Have you checked this? Your command says --targets my_baits.bed; are these the same as the target regions that you are supposed to pass?

ADD REPLY
0
Entering edit mode

Hi Steve, I might be mistaken but I send in my target regions (baits) as a bed file to --targets. I see the message above for all of my tumor samples and had initially understood this as since anti target sites are not targeted sites, there is little/no coverage there. Should I be doing something different?

ADD REPLY
1
Entering edit mode

For reference, the CNVkit portion of our pipeline is here: https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/ad4b2f42efb56d0753e58c2e07c4ef6dc7fc2337/main.nf#L3861 we do some extra custom processing as well

Our targets .bed files are here: https://github.com/NYU-Molecular-Pathology/NGS580-nf/tree/ad4b2f42efb56d0753e58c2e07c4ef6dc7fc2337/targets

I did not write this portion of the pipeline so my expertise is limited, but this is what we've got.

ADD REPLY
0
Entering edit mode

Hi Steve,

Thank you for your input and advise! I could run this script using Python 2 with the same bed file without issues but have been facing problems ever since I switched over to Python3. Nonetheless, I will refer to your pipeline to better understand how I could be running cnvkit.

ADD REPLY
0
Entering edit mode

Oh well if it worked in Python 2, then you should keep it that way. Use a conda installation or container of some sort to isolate it. This is our Dockerfile for building a CNVKit container: https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/master/containers/cnvkit-0.9.0/Dockerfile ; its using the Ubuntu 16.04 default Python which should be 2.7

ADD REPLY
0
Entering edit mode

for our sequencing panel, there were two .bed files provided: probes.bed (baits?) and targets.bed. The regions contained in both are different. The latter is the one that we use for CNVKit (targets.bed). You might want to check that there is not some other .bed file related to the sequencing that you are supposed to use instead of your baits.bed file.

ADD REPLY
0
Entering edit mode

If probes.bed indicates the captured regions, I'd go with that over targets.bed for CNV calling -- not all of the requested targets may have been successfully captured by the panel kit.

ADD REPLY
0
Entering edit mode
4.7 years ago
Eric T. ★ 2.8k

It could be that numpy or pandas became more stringent in a recent release and so datapoints that used to pass through are now being dropped or set to NA -- just speculating.

There are some changes worth considering in the development version of CNVkit, v0.9.7.dev0 or so. (The master branch on github.) Could you try that and see how it goes?

ADD COMMENT
0
Entering edit mode

Hi Eric,

I kind of had a similar problem. When using cnvkit.py coverage sample.bam antitarget.bed -o sample.antitargetcoverage.cnn, I got summary like this: Time: 14.480 seconds (0 reads/sec, 3945 bins/sec) Summary: #bins=57119, #reads=0, mean=0.0000, min=0.0, max=0.0 Percent reads in regions: 0.000 (of 23342057 mapped)

I opened the output cnn file and the depth column are all 0. Someone had used the same command on the same bam file and bed file but got different outputs with all the depths between 0 and 1.6. Do you have any idea of what went wrong?

ADD REPLY

Login before adding your answer.

Traffic: 2620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6