Question

Ploidy in copy number analysis

1

Entering edit mode

7.4 years ago

rahul.nahar ▴ 10

Hi For cancer copy number calling, recently many tools provide integer copy number calls after predicting tumor purity and ploidy through mathematical models. We have been using Illumina SNP arrays and exome-seq for genome wide copy number calling after passing the data through ASCAT and Sequenza respectively. We have also tried Absolute, CHAT etc. However we observe low concordance between ploidy predictions of these tools especially when tumor purity is lower (<30% which is very usual for lung cancer samples). These ploidy values have impact on integer copy number calls which then varies for the same sample depending on which tool you use. Also since it is a common practice in recent publications to correct copy number calls relative to ploidy - accurate ploidy / purity predictions become very important but >20% samples show inaccurate predictions from most tools.

My questions are

Is it a common practice to tweak ploidy and hence purity solutions based on experimentally determined ploidy (say FACS) for every sample ?
Why is it a becoming a trend to call copy numbers relative to ploidy as seen in large scale genomics papers like those from TCGA and pan cancer ones ? Doesn't ploidy (which is the absolute amount of DNA if my definition is correct) itself change due to copy number alterations in cancers. Partial or whole genome doubling and aneuploidy can cause ploidy changes and copy number changes. Also whatever be the ploidy, doesn't it matter more how many copies of a region / gene are present relative to a normal diploid cell (the matched normal) rather than relative ploidy of the tumor itself ?

Will be of great help if somebody can help me answer these questions.

Thanks

Rahul

genome • 4.0k views

ADD COMMENT • link updated 7.1 years ago by markus.riester ▴ 550 • written 7.4 years ago by rahul.nahar ▴ 10

score 0 · Answer 1 · 2017-03-06

Accurate and automatic ploidy estimation based on coverage and BAFs only becomes difficult below 30 to 35% (in whole-genome data with haplotype-phasing, the lower limit is more like 20-25%, higher coverage also helps). 80% concordance would be pretty good for <30% samples and probably means your data is otherwise nice and even. Especially calling homozygous deletions is hard, because most often they are small and only cover less than 3-4 heterozygous SNPs.

Regarding your questions:

It's common to manually curate samples. This also helps you to understand which tools work best with your data and assay. Some tools are more sensitive to artifacts, some work better in heterogenous samples than others, ... .
Calling amplifications by fixed cutoffs is hand-wavy anyways, since cutoffs that work for one gene (like HER2) are clearly not optimal for others (e.g. MYC). It's usually a good idea to distinguish between focally amplified (<2-4Mb) and broad amplifications. Using a less stringent cutoff for focal amplifications works well in practice. For high ploidy samples (>4), one can increase the cutoffs by 1. This is mainly useful when samples are not manually curated and one is concerned that the ploidy is potentially overestimated. Otherwise, I agree with you, calling relative to ploidy doesn't make a lot of sense biologically. It's really just an ad hoc way of reducing false positives.