Question: How to use result of CNVkit to estimate purity by PureCN
0
gravatar for yy1036832160
9 months ago by
yy10368321600 wrote:

Hello,how to use CNVKit’s .cnn and .cnr files as input file for PureCN to estimate purity? Can you give an example? Thank you

sequencing • 668 views
ADD COMMENTlink modified 9 months ago by markus.riester470 • written 9 months ago by yy10368321600
1
gravatar for markus.riester
9 months ago by
markus.riester470 wrote:

Have a look at the vignette available at https://bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.pdf

There is an CNVkit example at the end of the vignette.

Make sure to use version 1.10.0 or current GitHub or Bioconductor devel.

It currently works best with Mutect 1.1.7. It's easy and fast to run (see the main vignette for examples).

ADD COMMENTlink written 9 months ago by markus.riester470

Hello, I want to use results from gatk pipeline, do you have any ideas how to do in PureCN? I see no example in PureCN overview and quickstart. Thanks.

ADD REPLYlink written 8 days ago by MatthewP110

It currently does not take all GATK4 output files out-of-the-box, but you can provide the segmentation and copy number log-ratios (see Section 10.1). PureCN can read GATK4 coverage files (in hdf5 format). Simply provide the tumor coverage and PureCN will be able to map provided log-ratios to the genomic coordinates (no need to generate and provide an interval.file).

Btw, PureCN implements the GATK4 coverage normalization with added support for sex chromosomes and off-target regions. There are differences in GC-normalization and segmentation though.

ADD REPLYlink written 7 days ago by markus.riester470

Thanks @markus.riester. I tried to run a small test but some error raised.
My code

library(PureCN)
normal_hdf5 <- "/MntWorkdir/GATK_CNV/P2_PBMC.hdf5"
tumor_hdf5 <- "/MntWorkdir/GATK_CNV/P2_hdf5/P2-3.recal.hdf5"
interval_file <- "/MntWorkdir/BGI_ex_region_hg19_preprocessed_intervals.interval_list"

ratio <- calculateLogRatio(readCoverageFile(normal_hdf5), readCoverageFile(tumor_hdf5))
retLogRatio <- runAbsoluteCN(log.ratio=ratio, genome="hg19", plot.cnv=FALSE, interval.file=interval_file)
pdf("TestPlot.pdf")
plotAbs(retLogRatio, 1, type="hist")

Error message

INFO [2019-06-12 06:45:30] Loading coverage files...
Error in utils::read.table(file, header = TRUE) : 
  more columns than column names
Calls: runAbsoluteCN ... readCoverageFile -> .readCoverageGatk3 -> <Anonymous>
Execution halted

My coverage file *.hdf5 comes from gatk CollectReadCounts . Error message says .readCoverageGatk3 does this command suits gatk4 hdf5 format?
My interval file is picard style from gatk PreprocessIntervals not from PureCN .

ADD REPLYlink modified 6 days ago • written 6 days ago by MatthewP110

When you calculate the log-ratio like that, you are not using GATK4's denoising steps. GATK4 should generate a file with log2-ratio, you want to parse that file and provide the corresponding log2-ratio like you did.

The issue is likely the wrong format of the interval file. Just run it with:

retLogRatio <- runAbsoluteCN(tumor = tumor_hdf5, log.ratio=ratio, genome="hg19",  ....)

If you want to use GATK4's segmentation, provide it via seg.file (you might need to change the format for now, but I will add support for that soon, see the main vignette).

ADD REPLYlink written 5 days ago by markus.riester470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1772 users visited in the last hour