PureCN issue: having problems running a small example
1
0
Entering edit mode
2.4 years ago
bboy • 0

Hi ,

I am going to try to describe the problem as i am not sure if and/or how to attach the input files required.

What i am trying to do is to see how PureCN works. To that end i have taken 3 normal samples, mapped them to hg38 (bwa) extracted the reads mapping to chr20 and chrX, sub-selected roughly 800 exons (oh yea this is are simulated WES example i am using to test various things i am playing with ) the crated PON out of those 3 samples (bcbio-nextgen pipeline : followed the tutorial). Next i simulated one tumor sample (intorduces cnvs snps, etc.) mapped it using the same set of parameters and for the normals (bwa, bet, everything the same). In case of normals i called mutations using FreeBayes and for somatic ones in tumor i used Mutect2 (followed all best practices). Once all data was collected i tried to utilize PureCN and reached the final stage with no objections from any of the stages:

Rscript ${PC}/PureCN.R --out purecn --tumor tumor00-ST_coverage_loess.txt.gz --sampleid tumor00-ST --vcf tumor00-ST-batch-effects.vcf.gz --normaldb normalDB_hg38.rds  --mappingbiasfile mapping_bias_hg38.rds --intervals PON.multicov.txt --snpblacklist simple_repeat.bed --genome hg38 --force --postoptimize --seed 67 --bootstrapn 10 --cores 8

output:

INFO [2021-12-15 23:56:42] Loading PureCN 2.1.5...
WARN [2021-12-15 23:56:44] Recommended to provide --funsegmentation PSCBS.
INFO [2021-12-15 23:56:44] Using BiocParallel MulticoreParam backend with 8 cores.
INFO [2021-12-15 23:56:44] ------------------------------------------------------------
INFO [2021-12-15 23:56:44] PureCN 2.1.5
INFO [2021-12-15 23:56:44] ------------------------------------------------------------
INFO [2021-12-15 23:56:44] Arguments: -tumor.coverage.file tumor00-ST_coverage_loess.txt.gz -log.ratio  -seg.file  -vcf.file tumor00-ST-batch-effects.vcf.gz -genome hg38 -sex ? -args.setPriorVcf 6 -args.setMappingBiasVcf mapping_bias_hg38.rds -args.segmentation 0.005,NULL -sampleid tumor00-ST -min.ploidy 1.4 -max.ploidy 6 -max.non.clonal 0.2 -max.homozygous.loss 0.05,1e+07 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -interval.file PON.multicov.txt -max.segments 300 -plot.cnv TRUE -vcf.field.prefix PureCN. -DB.info.flag DB -POPAF.info.field POP_AF -model beta -post.optimize TRUE -log.file tumor00-ST.log -normal.coverage.file <data> -normalDB <data> -args.filterVcf <data> -fun.segmentation <data> -test.num.copy <data> -test.purity <data> -speedup.heuristics <data> -BPPARAM <data>
INFO [2021-12-15 23:56:44] Using BiocParallel for parallel optimization.
INFO [2021-12-15 23:56:44] Loading coverage files...
INFO [2021-12-15 23:56:44] Mean target coverages: 49X (tumor) 49X (normal).
WARN [2021-12-15 23:56:44] Allosome coverage missing, cannot determine sex.
WARN [2021-12-15 23:56:44] Allosome coverage missing, cannot determine sex.
INFO [2021-12-15 23:56:45] Removing 77 intervals with missing log.ratio.
INFO [2021-12-15 23:56:45] Removing 1 low/high GC targets.
INFO [2021-12-15 23:56:45] Removing 1684 intervals excluded in normalDB.
INFO [2021-12-15 23:56:45] Removing 2 intervals with low total coverage in normal (< 150.00 reads).
INFO [2021-12-15 23:56:45] normalDB provided. Setting minimum coverage for segmentation to 0.0015X.
INFO [2021-12-15 23:56:45] Removing 9 low count (< 100 total reads) intervals.
INFO [2021-12-15 23:56:45] Removing 615 low coverage (< 0.0015X) intervals.
INFO [2021-12-15 23:56:45] Using 395 intervals (188 on-target, 207 off-target).
INFO [2021-12-15 23:56:45] Ratio of mean on-target vs. off-target read counts: 0.06
INFO [2021-12-15 23:56:45] Mean off-target bin size: 118201
INFO [2021-12-15 23:56:45] AT/GC dropout: 0.97 (tumor), 0.96 (normal), 1.02 (coverage log-ratio).
INFO [2021-12-15 23:56:45] Loading VCF...
INFO [2021-12-15 23:56:46] Found 225 variants in VCF file.
INFO [2021-12-15 23:56:46] Removing 2 triallelic sites.
INFO [2021-12-15 23:56:46] Maximum of POPAP INFO is > 1, assuming -log10 scaled values
WARN [2021-12-15 23:56:46] vcf.file has no DB info field for membership in germline databases. Found and used valid population allele frequency > 0.001000 instead.
INFO [2021-12-15 23:56:46] 171 (76.7%) variants annotated as likely germline (DB INFO flag).
WARN [2021-12-15 23:56:46] Did not find base quality scores, will use global error rate of 0.0010 instead.
INFO [2021-12-15 23:56:46] tumor00-ST is tumor in VCF file.
INFO [2021-12-15 23:56:46] 33 homozygous and 63 heterozygous variants on chrX.
INFO [2021-12-15 23:56:46] Sex from VCF: F (Fisher's p-value: 0.879, odds-ratio: 0.92).
INFO [2021-12-15 23:56:46] Detected MuTect2 VCF.
INFO [2021-12-15 23:56:46] Removing 44 Mutect2 calls due to blacklisted failure reasons.
INFO [2021-12-15 23:56:46] Global base quality score of 29
INFO [2021-12-15 23:56:46] Minimum number of supporting reads ranges from 3 to 4, depending on coverage and BQS.
INFO [2021-12-15 23:56:47] Removing 64 variants with AF < 0.030 or AF >= 0.970 or insufficient supporting reads or depth < 15.
INFO [2021-12-15 23:56:51] Removing 6 blacklisted variants.
INFO [2021-12-15 23:56:51] Removing 0 low quality variants with non-offset BQ < 25.
INFO [2021-12-15 23:56:51] Total size of targeted genomic region: 0.05Mb (0.06Mb with 50bp padding).
INFO [2021-12-15 23:56:52] 6.9% of targets contain variants.
INFO [2021-12-15 23:56:52] Removing 94 variants outside intervals.
INFO [2021-12-15 23:56:52] Setting somatic prior probabilities for dbSNP hits to 0.000500 or to 0.500000 otherwise.
INFO [2021-12-15 23:56:52] Loading mapping bias file mapping_bias_hg38.rds...
INFO [2021-12-15 23:56:52] Found 97 variants in mapping bias file.
INFO [2021-12-15 23:56:52] Imputing mapping bias for 14 variants...
INFO [2021-12-15 23:56:52] Excluding 0 novel or poor quality variants from segmentation.
INFO [2021-12-15 23:56:52] Sample sex: ?
INFO [2021-12-15 23:56:52] Segmenting data...
INFO [2021-12-15 23:56:52] Interval weights found, will use weighted CBS.
INFO [2021-12-15 23:56:52] Loading pre-computed boundaries for DNAcopy...
INFO [2021-12-15 23:56:52] Setting undo.SD parameter to 0.500000.
Setting multi-figure configuration
[1] "seg-pval:NA: seg$pval < max.pval:1e-05"
[1] "seg-pval:: seg$pval < max.pval:1e-05"
[1] "seg-pval:NA: seg$pval < max.pval:1e-05"
[1] "seg-pval:: seg$pval < max.pval:1e-05"
[1] "seg-pval:NA: seg$pval < max.pval:1e-05"
[1] "seg-pval:: seg$pval < max.pval:1e-05"
INFO [2021-12-15 23:56:53] Setting prune.hclust.h parameter to 0.100000.
Error in hclust(dist(dx), method = method) :
  must have n >= 2 objects to cluster
Calls: runAbsoluteCN -> do.call -> <Anonymous> -> .pruneByHclust -> hclust
In addition: Warning message:
In .bcfHeaderAsSimpleList(header) :
  duplicate keys in header will be forced to unique rownames
Execution halted
`

I tried to fiddle with the code to see what i am missing (it seams seg-pval are either NA or missing) but at this point i am done messing around and hoping someone could direct me to a small (really small) working example that i could just plug into the above cmd and see that is works . I assume the problem is associated to my inputs but i would need days to figure it out.

Any readily available small example?

Sincerely Thank you for any help u might provide on this issue!

PureCN • 916 views
ADD COMMENT
1
Entering edit mode
2.4 years ago

Feel free to open a GitHub issue with an reproducible example to get the crash fixed. But building a toy example is tricky (took me a while to get something for examples and tests - data(purecn.example.output)). Even with 1,000 SNPs it will be pretty fast, so simulate a larger territory on more chromosomes.

ADD COMMENT

Login before adding your answer.

Traffic: 1469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6