Sciclone on exome sequencing data
2
0
Entering edit mode
5.6 years ago
cbst ▴ 160

I have the following questions regarding sciClone and whole-exome sequencing data:

1. Can it be used for exome sequencing data (and did someone manage to use it with WES and got decent results?)
2. Should I make special considerations when making the VAF file from my exome sequencing results, like criteria for variant selection and so on
3. Which tool can be used for best copy number prediction on exome seq data. My samples have both normal and tumor.
sciclone exome seq copy number • 3.2k views
1
Entering edit mode

I've used sciClone on my exome data and got reasonable result. But in my case I don't have the matched normal. I don't think any special considerations are required other than sciclone's protocol. See this discussion, Chris has suggested CN prediction tools also.

0
Entering edit mode

OK, thank you for your reply. And was sciClone fast to generate output? When I tried for one sample it looked like it got stuck on the first step. I did not get any output other than "checking input data ..." for several hours. I made the VAF file from the Mutect VCF file, and I used ASCAT for copy number prediction after preprocessing the exome files, but I think I would rather use a tool made for exome sequencing data, hence my question. It is just that sciClone also recommended ASCAT.

4
Entering edit mode
5.6 years ago
• Yes, we frequently use sciClone on exome sequencing data. As long as you have a reasonable number of mutations in your sample, there should be no issues.

• There are no "special considerations" necessary, except to only include the variant calls that you believe are somatic.

• No, sciClone should be quite fast (minutes to an hour at most). If it's taking longer than that, examing your sample a bit more. If you have both tens of thousands of mutations and poorly defined clusters, it will take longer. You might consider reducing the number of variants (and increasing your confidence in their position) by increasing the minimum depth variable.

• Copy number calls can be made using the algorithm of your choice - VarScan, cn.mops, many others. I don't recall ever recommending ASCAT, except possibly to say that allele-specific assignments of CN information may be incorporated into future versions. It may work fine, but I have no personal experience running it.

0
Entering edit mode

Thanks Chris for the very quick response! I will try as you said, and hopefully it will work after I go through my variant calls, and use another tool for copy number prediction.

0
Entering edit mode

I thought I would just include as a final comment in my post where I read about using ASCAT for WES in the Sciclone paper:

As with other tools [6], [11], [22], [30], regions of CNA and LOH are provided as inputs after having been inferred from whole-exome sequencing (WES, e.g., via ASCAT [...]

0
Entering edit mode

Does sciClone work over Targeted Panels samples as well ?

0
Entering edit mode

Yes, but the more variants you have, the better you'll be able to define the clusters.

1
Entering edit mode
5.5 years ago
cbst ▴ 160

New question:

Does sciClone calculate/take into account tumor percentage when calculating VAF? Is there a way to give it as input?

The reason I ask if because my VAF plots are not scaled to tumor percentage, and it looks as though the VAF of a subclone increases between primary and relapse, but if you scale to tumor percentage it might actually be the same VAF at the two timepoints, it is just that one biopsy contains more tumor cell than the other.

I have information on tumor percentage for my samples, and I could of course upscale the VAFs myself, but I would rather have the nice sciClone plots, and I am also curious how sciClone handles tumor percentage.

1
Entering edit mode

The short answer is that it doesn't. There are several options for dealing with impure tumors:

• zoom in by using the "xlim" and "ylim" params on sc.plot2d() to set the maximum VAF of the plot

• make sure your copy number data is scaled appropriately (preferably during calling), or alter the copyNumberMargins parameter (or you may not excluding all CN events)

0
Entering edit mode

I see that in the paper you published in Nature Communication you write the following:

Variant allele fractions of all tier 1 variants were corrected for purity by reducing the number of reference-supporting reads in proportion to the purity of the sample. This effectively scales up the VAFs in such a way that founding clone variants are near 50% VAF.

I couldn't find the code that actually did that, so I guess it was part of the pre-processing, but to be sure that I understand correctly, I thought I would give an example:

If you have 60 reference-supporting reads and a tumor purity of 70% (meaning 70% tumor, and 30% normal), what you do is that you multiply the nr of ref-reads by 0.70 or by 0.30?

0
Entering edit mode

If you want to account for purity, it's simple - just alter the VAFs (but leave the readcounts alone! They carry information about the amount of error). So if your observed VAF is 20% and your purity is 0.7, then your corrected purity is 20/0.7 = 28.57%. Or if it's 25% VAF and 0.5 purity, you end up with 50%, just as you'd expect.

0
Entering edit mode

Ok, that's what I did, but the reason I was unsure if it was the right to do is because I am still getting VAFs larger than 50%. I was expecting VAFs smaller or equal to 50% at most.

Do you know what that can be due to? I am thinking that maybe there might be regions of LOH that were not included in my exclude.loh file. Do you think sciClone gives more accurate results if I exclude those variants with VAF > 50%?

0
Entering edit mode

Some VAFs may be abit above 50% due to sampling error, but if there are a large number of them in a cluster clearly above 50%, then yes there is probably CN or LOH involved. Check your CN and LOH calls, and don't forget to scale them for purity as well.

0
Entering edit mode

I still get some VAFs above 50%, and I think they occur in regions with no CN calls, which are assumed by sciClone to be equal 2. One way to exclude them would be to add these regions in the exclude file. But since sciClone already detects those regions, wouldn't it be easier to add an option "excludeRegionsWithMissingCN" that can be set to TRUE when calling sciClone?

0
Entering edit mode

CN regions leave symmetrical VAFs. So for every cluster at 66% (from 3x copy number), there's a corresponding region at 33% (from the strand that didn't get amplified). If you just exclude the points in that 66% cluster (because they're > 50%) then you leave the others behind and may infer a subclone where there is actually none.

The right answer is to go back and improve your CN calls. As with most algorithms, garbage in garbage out. It would be nice to add joint calling of CN and clonality, but that's a more difficult problem that I haven't had time to tackle. There are other packages out there that attempt to do so, but I can't offer specific recommendations, as I haven't benchmarked them.

0
Entering edit mode

so sciClone can handle internally for the CN gain regions as long as one feed it a segmentation file output by programs such as VarScan, cn.mops, but LOH regions needs to be specified explicitly by another LOH.exclude file?

0
Entering edit mode

Yes, SciClone will exclude CN regions given to it in that file. If there are LOH regions that are copy number neutral (CN2), then those will need to be excluded separately.