Question: Passing cnvkit output to pureCN to account for cellularity
1
gravatar for biologist
5 months ago by
biologist10
biologist10 wrote:

Hello all,

I have a question about how to pass cnvkit's output to pureCN to account for the tumor cellularity,

According to https://bioconductor.org/packages/release/bioc/vignettes/PureCN/inst/doc/Quick.pdf, Page 6

I first do:

Rscript $PURECN/NormalDB.R --outdir $OUT_REF --normal_panel $NORMAL_PANEL \
--assay agilent_v6 --genome hg19 --force

When I run this, I am asked for --coveragefiles. Can I provide a file from cnvkit for this?

Secondly:

> cnvkit.py export seg $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.cns
> --enumerate-chroms \
> -o $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.seg

And finally:

> Rscript $PURECN/PureCN.R --out $OUT/$SAMPLEID \
> --sampleid $SAMPLEID \
> --tumor $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.cnr \
> --segfile $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.seg \
> --normal_panel $OUT_REF/mapping_bias_agilent_v6_hg19.rds \
> --vcf ${SAMPLEID}_mutect.vcf \
> --statsfile ${SAMPLEID}_mutect_stats.txt \
> --snpblacklist hg19_simpleRepeats.bed \
> --genome hg19 \
> --funsegmentation none \
> --force --postoptimize --seed 123

Here, could someone please advise me on what snpblacklist is and what is being used as --normal_panel?

I tried reading the manuals but I am still confused.

Any suggestions would be appreciated, thanks!

purecn cnvkit • 358 views
ADD COMMENTlink modified 5 months ago by markus.riester480 • written 5 months ago by biologist10
3
gravatar for markus.riester
5 months ago by
markus.riester480 wrote:

Hi,

if your VCF file contains a matched normal, then --normal_panel is not crucial. Essentially, PureCN assumes that heterozygous SNPs have an expected allelic fraction of 0.5 without copy number events. By providing a pool of normals, PureCN can check all common SNPs for deviation from 0.5 and can adjust or discard poor quality SNPs. When a matched normal is available, PureCN has at least access to a single normal.

Have a look at the FAQ section how to create a panel of normal VCF. Then there is a NormalDB.R script that can generate the mapping_bias.rds file for you. But again, for test runs, you won't need this.

--snpblacklist can be any BED file. It will simply ignore every variant in the VCF that falls into those regions. Have a look at the main vignette where it downloads the simple_repeats track from UCSC.

And yes, PureCN accepts CNVkit coverage files.

If you use most recent GitHub (which will become new stable next week), you can replace --funsegmentation none with --funsegmentation Hclust. This might undo some over-segmentation when present.

Best, Markus

ADD COMMENTlink written 5 months ago by markus.riester480

Hi Markus,

Thank you for your detailed response. I would like to clarify some things: 1) For the first command, you mentioned that PureCN accepts CNVkit coverage files. Does this mean a .cnr or .cns file from cnvkit? (Sorry for the basic question but I wasn't exactly sure what a coverage file was here) What output would this command produce and where would it be next used? 2) For the last command, what I understand is that for --tumor, we need to provide a vcf for tumor from mutect and if the tumor was matched with the normal, then we do not have to specify --normal_panel. For the tumor vcf, do I run the tumor against a panel of normal that I already had created or just against its own matched normal sample? Is --statsfile optional? Is --segfile an optional argument? Thank you once again for your time!

ADD REPLYlink modified 5 months ago • written 5 months ago by biologist10
1

1) The .cnr files 2) You run Mutect as you normally would, i.e. provide the normal BAM file and you can also provide --normal_panel for artifact flagging. --statfile is optional, but since Mutect automatically generates it it's easy to add. When provided, PureCN can remove artifacts based on the flags in the statsfile (unfortunately Mutect1 does not add those in the VCF). So if you do the artifact filtering yourself (keep the germline SNPs though!), you can skip --statfile.

If you don't provide --segfile, PureCN should segment the coverage log-ratio (in the .cnr file). So if you want to use the CNVkit segmentation, provide --segfile.

ADD REPLYlink written 5 months ago by markus.riester480
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1122 users visited in the last hour