Question: How to use result of CNVkit to estimate purity by PureCN
gravatar for yy1036832160
21 months ago by
yy10368321600 wrote:

Hello,how to use CNVKit’s .cnn and .cnr files as input file for PureCN to estimate purity? Can you give an example? Thank you

sequencing • 1.2k views
ADD COMMENTlink modified 20 months ago by markus.riester490 • written 21 months ago by yy10368321600
gravatar for markus.riester
20 months ago by
markus.riester490 wrote:

Have a look at the vignette available at

There is an CNVkit example at the end of the vignette.

Make sure to use version 1.10.0 or current GitHub or Bioconductor devel.

It currently works best with Mutect 1.1.7. It's easy and fast to run (see the main vignette for examples).

ADD COMMENTlink written 20 months ago by markus.riester490

Hello, I want to use results from gatk pipeline, do you have any ideas how to do in PureCN? I see no example in PureCN overview and quickstart. Thanks.

ADD REPLYlink written 11 months ago by MatthewP620

It currently does not take all GATK4 output files out-of-the-box, but you can provide the segmentation and copy number log-ratios (see Section 10.1). PureCN can read GATK4 coverage files (in hdf5 format). Simply provide the tumor coverage and PureCN will be able to map provided log-ratios to the genomic coordinates (no need to generate and provide an interval.file).

Btw, PureCN implements the GATK4 coverage normalization with added support for sex chromosomes and off-target regions. There are differences in GC-normalization and segmentation though.

ADD REPLYlink written 11 months ago by markus.riester490

Thanks @markus.riester. I tried to run a small test but some error raised.
My code

normal_hdf5 <- "/MntWorkdir/GATK_CNV/P2_PBMC.hdf5"
tumor_hdf5 <- "/MntWorkdir/GATK_CNV/P2_hdf5/P2-3.recal.hdf5"
interval_file <- "/MntWorkdir/BGI_ex_region_hg19_preprocessed_intervals.interval_list"

ratio <- calculateLogRatio(readCoverageFile(normal_hdf5), readCoverageFile(tumor_hdf5))
retLogRatio <- runAbsoluteCN(log.ratio=ratio, genome="hg19", plot.cnv=FALSE, interval.file=interval_file)
plotAbs(retLogRatio, 1, type="hist")

Error message

INFO [2019-06-12 06:45:30] Loading coverage files...
Error in utils::read.table(file, header = TRUE) : 
  more columns than column names
Calls: runAbsoluteCN ... readCoverageFile -> .readCoverageGatk3 -> <Anonymous>
Execution halted

My coverage file *.hdf5 comes from gatk CollectReadCounts . Error message says .readCoverageGatk3 does this command suits gatk4 hdf5 format?
My interval file is picard style from gatk PreprocessIntervals not from PureCN .

ADD REPLYlink modified 11 months ago • written 11 months ago by MatthewP620

When you calculate the log-ratio like that, you are not using GATK4's denoising steps. GATK4 should generate a file with log2-ratio, you want to parse that file and provide the corresponding log2-ratio like you did.

The issue is likely the wrong format of the interval file. Just run it with:

retLogRatio <- runAbsoluteCN(tumor = tumor_hdf5, log.ratio=ratio, genome="hg19",  ....)

If you want to use GATK4's segmentation, provide it via seg.file (you might need to change the format for now, but I will add support for that soon, see the main vignette).

ADD REPLYlink written 11 months ago by markus.riester490

PureCN version 1.15.4 now added support for the GATK4 copy number workflow.

ADD REPLYlink written 9 months ago by markus.riester490
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1844 users visited in the last hour