Question: How to use result of CNVkit to estimate purity by PureCN
gravatar for yy1036832160
2.1 years ago by
yy10368321600 wrote:

Hello,how to use CNVKit’s .cnn and .cnr files as input file for PureCN to estimate purity? Can you give an example? Thank you

sequencing • 1.4k views
ADD COMMENTlink modified 2.1 years ago by markus.riester500 • written 2.1 years ago by yy10368321600
gravatar for markus.riester
2.1 years ago by
markus.riester500 wrote:

Have a look at the vignette available at

There is an CNVkit example at the end of the vignette.

Make sure to use version 1.10.0 or current GitHub or Bioconductor devel.

It currently works best with Mutect 1.1.7. It's easy and fast to run (see the main vignette for examples).

ADD COMMENTlink written 2.1 years ago by markus.riester500

Hello, I want to use results from gatk pipeline, do you have any ideas how to do in PureCN? I see no example in PureCN overview and quickstart. Thanks.

ADD REPLYlink written 16 months ago by MatthewP780

It currently does not take all GATK4 output files out-of-the-box, but you can provide the segmentation and copy number log-ratios (see Section 10.1). PureCN can read GATK4 coverage files (in hdf5 format). Simply provide the tumor coverage and PureCN will be able to map provided log-ratios to the genomic coordinates (no need to generate and provide an interval.file).

Btw, PureCN implements the GATK4 coverage normalization with added support for sex chromosomes and off-target regions. There are differences in GC-normalization and segmentation though.

ADD REPLYlink written 16 months ago by markus.riester500

Thanks @markus.riester. I tried to run a small test but some error raised.
My code

normal_hdf5 <- "/MntWorkdir/GATK_CNV/P2_PBMC.hdf5"
tumor_hdf5 <- "/MntWorkdir/GATK_CNV/P2_hdf5/P2-3.recal.hdf5"
interval_file <- "/MntWorkdir/BGI_ex_region_hg19_preprocessed_intervals.interval_list"

ratio <- calculateLogRatio(readCoverageFile(normal_hdf5), readCoverageFile(tumor_hdf5))
retLogRatio <- runAbsoluteCN(log.ratio=ratio, genome="hg19", plot.cnv=FALSE, interval.file=interval_file)
plotAbs(retLogRatio, 1, type="hist")

Error message

INFO [2019-06-12 06:45:30] Loading coverage files...
Error in utils::read.table(file, header = TRUE) : 
  more columns than column names
Calls: runAbsoluteCN ... readCoverageFile -> .readCoverageGatk3 -> <Anonymous>
Execution halted

My coverage file *.hdf5 comes from gatk CollectReadCounts . Error message says .readCoverageGatk3 does this command suits gatk4 hdf5 format?
My interval file is picard style from gatk PreprocessIntervals not from PureCN .

ADD REPLYlink modified 16 months ago • written 16 months ago by MatthewP780

When you calculate the log-ratio like that, you are not using GATK4's denoising steps. GATK4 should generate a file with log2-ratio, you want to parse that file and provide the corresponding log2-ratio like you did.

The issue is likely the wrong format of the interval file. Just run it with:

retLogRatio <- runAbsoluteCN(tumor = tumor_hdf5, log.ratio=ratio, genome="hg19",  ....)

If you want to use GATK4's segmentation, provide it via seg.file (you might need to change the format for now, but I will add support for that soon, see the main vignette).

ADD REPLYlink written 16 months ago by markus.riester500

PureCN version 1.15.4 now added support for the GATK4 copy number workflow.

ADD REPLYlink written 14 months ago by markus.riester500
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1518 users visited in the last hour