Identifying the tumor clones and subclones with VAF from VarScan2
1
2
Entering edit mode
7.4 years ago
ivivek_ngs ★ 5.1k

Dear Users,

I would like to know , that given I have only one tumor and its normal type and I have the sets of germline and somatic mutations with the frequency listed by VarScan2, how can I use this information along with the region information of this mutation corresponding to a gene , to outline the clonal and sub-clonal populations of mutation in the tumor. Is there any method that can help me generate a model which can help me understand which mutations are occupied in the entire tumor population and which less. This would help me understand all the sub clones of the tumor. In other way it will help me understand how the mutations are categorized in the entire tumor mass and also inform me to what extent this mutation is having a stake in the tumor. This classification helps to reconstruct the tumor fate and its evolution and will also enable me to list out the potential driver mutations and passenger mutations concerned with that tumor. Is there any tool that can help me do this? Most of the tools work on multiple samples. I would like to have some suggestions on these lines.

exome sequencing SNP next-gen • 4.7k views
0
Entering edit mode

Not sure if this is what you are asking. But if you have VAF of all somatic mutations from a pair of normal and tumor, you can use mclust to identify clusters (or biologically clones).

0
Entering edit mode

@poisonAlien - Please correct me if am wrong. The frequency percentage which the varscan2 provides for the normal/tumor pair in the output is having both frequencies at the normal site and at the variant site of the tumor. In order to understand what is the frequency of the variant , I should consider the frequency of the variants at the tumor site right? I am talking about the column tumor_var_freq in the VarScan output. This is the VAF am talking about for considering the somatic mutations. Please let me know if am correct or not?

0
Entering edit mode

Hi,

I am not familiar with the varScan2 output. But if you are interested in somatic mutations you should be looking at VAF (no. of reads supporting variant allele / total no. of reads) at tumor site. For the same position, VAF at normal site should be far lower (maybe 0 or <0.05 ?). I have also seen some papers, where they classify it as somatic if the fold change of VAF between tumor and normal is > 5 (just a filtering criteria).

0
Entering edit mode

Thanks a lot for the suggestion. I would like to know if you are familiar with GATK multi-sampling. I have bam files of tumor with its paired normal and two reprogrammed clones(single clone) from the tumor. If I want to take all the 4 bam files and want to call the variants , in order to understand which tumor specific mutations are still carried in it reprogrammed clone , can that be done by GATK? Can you tell me the script that can handle this type of call? It would be nice if you can give me any suggestions. Thanks a lot.

0
Entering edit mode

Thanks for the information. Sorry , I am not being able to use the package, can you please guide me how to install the package sciclone

can you tell me how to download the sciClone from the github, which are the necessary packages that are needed to be downloaded from github , am unable to understand how to download sciclone from github and use it in R in my mac locally, it is not in CRAN or Bioconductors so I have to download the package locally and then import it in the R . However my R version is 2.15. Is that compatible for SciClone?

0
Entering edit mode

I have upgraded my R version and installed sciClone, now will try to understand how it works out, thanks a lot.

3
Entering edit mode
7.4 years ago

We built the sciClone package for exactly this purpose: https://github.com/genome/sciclone

It takes inputs of somatic mutations, with readcounts and VAFs, and uses that information to infer subclonal populations in heterogeneous tumors. It also gives you some nice visualization options.

0
Entering edit mode

Hi Chris,

Am trying to use the scClone now, I see that to use them I need to generate the copynumber data and the bed file for the LOH data, without them cant I use the function to form the clustering

#read in regions to exclude (commonly LOH)
#format is 3-col bed

#read in segmented copy number data
#4 columns - chr, start, stop, segment_mean
cn3 = read.table("data/copy_number_tum3")

Can the clustering be directly done from the tumor VAFs with the 5 columns as mentioned? If I need the LOH and the CN file can you tell me how you are generating them? I am using VARSCAN to find the somatic mutations, so the output of VARSCAN is having both LOH and Somatic events, and I can make data file from them in the mentioned format in your readme document. I did the copy number analysis with the VarScan copynumber walker and used CBS to generate the profiles across the all the chromosomes, there I have some output files as well, can that be used for creating the CN files , also please let me know if without the LOH and CN files can I make the calls for clustering?

0
Entering edit mode

Hi Chris,

I have using the SciClone and have been using the data provided by you in te github and checking if I can reproduce the results and I am able to do it , only thing which I want to know is , how you are generating the exclude.LOH file, I could not understand the file when I saw it in R, can you tell me how to generate it? I am having tumor samples and having LOH snp files for my tumors as well, can you tell me if I can use them to generate the exclude region files, if so then how. Thanks for the wonderful software. Will wait for your valuable suggestions.

0
Entering edit mode

Sorry for the slow response, I've been out of town.

Short answer: That input is optional and may not be relevant to all cancers, so you may be fine without it.

Long answer: I believe we take the frequencies of germline SNPs in the tumor genome and look for large regions where there are no het sites, caused by CN-neutral LOH (aka UPD). You can segment them into discrete regions using CBS, as implemented in the DNAcopy package for R. Plotting on a per-chromosome basis should also make them clearly evident, if they exist.

0
Entering edit mode

Hi Chris,

I would greatly appreciate your suggestion on this. I have somatic mutations from primary tumor and relapse (less than 100 each). I have been using sciClone for clonality. Everything is great so far and thank you for the tool. A small query though.  Do you remove SNPs (rsIDs) before inputting into sciClone? Because nearly 70% of our somatic mutations can be annotated to dbSNPs (rsIDs). But they are absent in germline ! Just wondering is it possible that so many mutations are accumulated at dbSNPs.

0
Entering edit mode

could you please share how you identify LOH regions in your samples?

0
Entering edit mode

I don't have a script handy at the moment, but I described it above. Run Varscan on the tumor/normal bams, extract the "Germline or "LOH" calls, then segment with the DNAcopy package for R

0
Entering edit mode

I have segment output from sequenza. should I grep out the LOH regions where CNt (copy number tumor) equals to 1 for each sample, and merge all the LOH regions from each sample together to get a common LOH file?

"chromosome"    "start.pos"     "end.pos"       "Bf"    "N.BAF" "sd.BAF"        "depth.ratio"   "N.ratio"       "sd.ratio"      "CNt"   "A"     "B"     "LPP"
"1"     10007   17201620        0.311182913031387       987     0.118208809974812       0.917333786712282       94928   0.302492249516788       2       2       0       -7.3
"1"     17203565        109649162       0.318065461132322       1640    0.0980762061608926      0.8435047799082 285840  0.266303069182703       2       2       0       -6.7


Thanks, Tommy

0
Entering edit mode

I'm not familiar with sequenza, so will not be of much help there. your main concern is with finding regions of LOH that are CN neutral (aka UPD), so that they can be appropriately excluded.

0
Entering edit mode

so, the LOH regions excluded are copy-number neutral LOH regions only. those are still with CNt=2, but how does one identify those regions? I thought if it is only LOH regions, I will just need to look for regions with CNt=1 (lost one copy). Thanks!