Question: Running sciClone on cell prevalence (CP) data
gravatar for marki
3.7 years ago by
European Union
marki60 wrote:


As I understand, sciClone uses estimates of variant allele frequency (VAF) in copy-neutral regions to estimate subclonality (thus, in a manner, not using CNV info. Please correct me if I am wrong).

The exome sequencing data, I have got, has relatively fewer mutations and a higher variation in copy number. When I remove the non-neutral copy number regions (copy number != 2), I am left with very few mutations. Given that I have multiple samples for each case, the resulting VAF matrix becomes very sparse leading to poor results. 

I was wondering if I can complement the VAF with copy number and ploidy to compute cancer cell fraction ( or, alternatively, compute cell prevalence (CP) values using PyClone or ASCAT, and feed that into sciClone. Will sciClone clustering work as it does when using VAFs? If yes, that's excellent; if not, can you recommend some alternative tool for subclonal reconstruction. 


ADD COMMENTlink modified 3.7 years ago by Chris Miller21k • written 3.7 years ago by marki60
gravatar for Chris Miller
3.7 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

Hi Ikram, I'm going to close your github issue and post my answer here.

The short answer is yes, you can convert any kind of measure to a VAF-equivalent and use it.

The longer answer is that doing so is complicated - I've described some of the difficulties here. The easiest class of variants to copy number correct are those with single copy deletion. As each site will be present in 100% of the tumor cells, it's difficult to conflate them with any other subclone/ploidy combination and easy to correct as well (just divide VAF by 2). Others get quite a bit more complicated.

ADD COMMENTlink modified 13 months ago by RamRS24k • written 3.7 years ago by Chris Miller21k

Hi Chris,

Thanks for the prompt reply. I think the problem is even more complex since copy number calling (in itself) in exome-seq data is not that reliable as in whole genome data. However, given the relatively lower number of mutations, one is always tempted to use all of them as much as possible.

ADD REPLYlink written 3.7 years ago by marki60

Indeed, I understand the pain. One more thing you can try is to look at the depth of the data you have and see if it makes sense to reduce your minimum depth settings. You'll lose a little discriminative power between clusters that are very close, but may gain additional points.

ADD REPLYlink written 3.7 years ago by Chris Miller21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1647 users visited in the last hour