Question: Identifying the tumor clones and subclones with VAF from VarScan2
gravatar for ivivek_ngs
5.8 years ago by
Seattle,WA, USA
ivivek_ngs4.9k wrote:

Dear Users,

 I would like to know , that given I have only one tumor and its normal type and I have the sets of germline and somatic mutations with the frequency listed by VarScan2, how can I use this information along with the region informations of this mutation corresponding to a gene , to outline the clonal and subclonal populations of mutation in the tumor. Is there any method that can help me generate a model which can help me understand which mutations are occupied in the entire tumor population and which less. This would help me understand all the sub clones of the tumor. In other way it will help me understand how the mutations are categorized in the entire tumor mass and also inform me to what extent this mutation is having a stake in the tumor. This classification helps to reconstruct the tumor fate and its evolution and will also enable me to list out the potential driver mutations and passenger mutations concerned with that tumor. Is there any tool that can help me do this? Most of the tools work on multiple samples. I would like to have some suggestions on these lines.

sequencing snp next-gen exome • 4.2k views
ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by ivivek_ngs4.9k

Not sure if this is what you are asking. But if you have VAF of all somatic mutations from a pair of normal and tumor, you can use mclust to identify clusters (or biologically clones).

ADD REPLYlink written 5.8 years ago by poisonAlien2.8k

@poisonAlien - Please correct me if am wrong. The frequency percentage which the varscan2 provides for the normal/tumor pair in the output is having both frequencies at the normal site and at the variant site of the tumor. In order to understand what is the frequency of the variant , I should consider the frequency of the variants at the tumor site right?  I am talking about the column tumor_var_freq in the VarScan output. This is the VAF am talking about for considering the somatic mutations. Please let me know if am correct or not? 

ADD REPLYlink written 5.8 years ago by ivivek_ngs4.9k


I am not familiar with the varScan2 output. But if you are interested in somatic mutations you should be looking at VAF (no. of reads supporting variant allele / total no. of reads) at tumor site. For the same position, VAF at normal site should be far lower (maybe 0 or <0.05 ?). I have also seen some papers, where they classify it as somatic if the fold change of VAF between tumor and normal is > 5 (just a filtering criteria).

ADD REPLYlink written 5.8 years ago by poisonAlien2.8k

Thanks a lot for the suggestion. I would like to know if you are familiar with GATK multi-sampling. I have bam files of  tumor with its paired normal and two reprogrammed clones(single clone) from the tumor. If I want to take all the 4 bam files and want to call the variants , in order to understand which tumor specific mutations are still carried in it reprogrammed clone , can that be done by GATK? Can you tell me the script that can handle this type of call? It would be nice if you can give me any suggestions. Thanks a lot.

ADD REPLYlink written 5.8 years ago by ivivek_ngs4.9k

Thanks for the information. Sorry , I am not being able to use the package, can you please guide me how to install the package sciclone


can you tell me how to download the sciClone from the github, which are the necessary packages that are needed to be downloaded from github , am unable to understand how to download sciclone from github and use it in R in my mac locally, it is not in CRAN or Bioconductors so I have to download the package locally and then import it in the R . However my R version is 2.15. Is that compatible for SciClone?

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by ivivek_ngs4.9k

I have upgraded my R version and installed sciClone, now will try to understand how it works out, thanks a lot.

ADD REPLYlink written 5.8 years ago by ivivek_ngs4.9k
gravatar for Chris Miller
5.8 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

We built the sciClone package for exactly this purpose:

It takes inputs of somatic mutations, with readcounts and VAFs, and uses that information to infer subclonal populations in heterogeneous tumors. It also gives you some nice visualization options.

ADD COMMENTlink written 5.8 years ago by Chris Miller21k

Hi Chris,

Am trying to use the scClone now, I see that to use them I need to generate the copynumber data and the bed file for the LOH data, without them cant I use the function to form the clustering


#read in regions to exclude (commonly LOH)
#format is 3-col bed
regions = read.table("data/exclude.loh")

#read in segmented copy number data
#4 columns - chr, start, stop, segment_mean   
cn1 = read.table("data/copy_number_tum1")
cn2 = read.table("data/copy_number_tum2")
cn3 = read.table("data/copy_number_tum3")


Can the clustering be directly done from the tumor VAFs with the 5 columns as mentioned? If I need the LOH and the CN file can you tell me how you are generating them? I am using VARSCAN to find the somatic mutations, so the output of VARSCAN is having both LOH and Somatic events, and I can make data file from them in the mentioned format in your readme document. I did the copy number analysis with the VarScan copynumber walker and used CBS to generate the profiles across the all the chromosomes, there I have some output files as well, can that be used for creating the CN files , also please let me know if without the LOH and CN files can I make the calls for clustering?

ADD REPLYlink written 5.8 years ago by ivivek_ngs4.9k

Hi Chris,


I have using the SciClone and have been using the data provided by you in te github and checking if I can reproduce the results and I am able to do it , only thing which I want to know is , how you are generating the exclude.LOH file, I could not understand the file when I saw it in R, can you tell me how to generate it? I am having tumor samples and having LOH snp files for my tumors as well, can you tell me if I can use them to generate the exclude region files, if so then how. Thanks for the wonderful software. Will wait for your valuable suggestions.

ADD REPLYlink written 5.8 years ago by ivivek_ngs4.9k

Sorry for the slow response, I've been out of town.

Short answer: That input is optional and may not be relevant to all cancers, so you may be fine without it.

Long answer: I believe we take the frequencies of germline SNPs in the tumor genome and look for large regions where there are no het sites, caused by CN-neutral LOH (aka UPD). You can segment them into discrete regions using CBS, as implemented in the DNAcopy package for R. Plotting on a per-chromosome basis should also make them clearly evident, if they exist.

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by Chris Miller21k

Hi Chris,

I would greatly appreciate your suggestion on this. I have somatic mutations from primary tumor and relapse (less than 100 each). I have been using sciClone for clonality. Everything is great so far and thank you for the tool. A small query though.  Do you remove SNPs (rsIDs) before inputting into sciClone? Because nearly 70% of our somatic mutations can be annotated to dbSNPs (rsIDs). But they are absent in germline ! Just wondering is it possible that so many mutations are accumulated at dbSNPs. 

ADD REPLYlink written 5.7 years ago by poisonAlien2.8k

could you please share how you identify LOH regions in your samples?

ADD REPLYlink written 3.4 years ago by Ming Tang2.5k

I don't have a script handy at the moment, but I described it above. Run Varscan on the tumor/normal bams, extract the "Germline or "LOH" calls, then segment with the DNAcopy package for R

ADD REPLYlink written 3.4 years ago by Chris Miller21k

I have segment output from sequenza. should I grep out the LOH regions where CNt (copy number tumor) equals to 1 for each sample, and merge all the LOH regions from each sample together to get a common LOH file?

"chromosome"    "start.pos"     "end.pos"       "Bf"    "N.BAF" "sd.BAF"        "depth.ratio"   "N.ratio"       "sd.ratio"      "CNt"   "A"     "B"     "LPP"
"1"     10007   17201620        0.311182913031387       987     0.118208809974812       0.917333786712282       94928   0.302492249516788       2       2       0       -7.3
"1"     17203565        109649162       0.318065461132322       1640    0.0980762061608926      0.8435047799082 285840  0.266303069182703       2       2       0       -6.7

Thanks, Tommy

ADD REPLYlink written 3.4 years ago by Ming Tang2.5k

I'm not familiar with sequenza, so will not be of much help there. your main concern is with finding regions of LOH that are CN neutral (aka UPD), so that they can be appropriately excluded.

ADD REPLYlink written 3.4 years ago by Chris Miller21k

so, the LOH regions excluded are copy-number neutral LOH regions only. those are still with CNt=2, but how does one identify those regions? I thought if it is only LOH regions, I will just need to look for regions with CNt=1 (lost one copy). Thanks!

ADD REPLYlink written 3.4 years ago by Ming Tang2.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1097 users visited in the last hour