Question: Determining heteroclonality of mutations in TCGA data
I'm trying to determine the heteroclonality of mutations in specific genes across histologies by using TCGA data. In essence, what percentage of mutations in Gene X are subclonal?

An imperfect analysis would be to take the variant allele frequency (VAF) for each patient, and define patients with a VAF < 0.5 as having "definite heteroclonality," the logic being that if the mutation is monoallelic in 100% of tumor cells the VAF should be 0.5.

The big caveat is that if a mutation is in fact bi-allelic then it will be a false negative. Another big caveat is that this analysis is assuming 100% tumor content, which I know is incorrect - TCGA requires >70% tumor content (so maybe a cutoff of 0.35 would be more appropriate). Which leads to my questions:

1) I'd like to account for tumor content for each TCGA patient (if possible), does anyone know if/where this data is available? I've been scouring cBioPortal and publications but can't find the tumor content for each TCGA patient (just the fact that it should always be >70%).

2) Is there a more appropriate cutoff for VAF to confidently call heteroclonality? Should I build in some sort of uncertainty in my cutoffs (I could imagine a VAF of 0.47 that's truly monoallelic, but is now a false positive because of variability in the measurements). Should I be requiring a VAF < 0.5 * Tumor Content? Should I arbitrarily raise/lower my VAF cutoff by 5% to account for error in measurements?

3) Would it be more appropriate to bin VAFs into "monoallelic," "biallelic," and "neither" categories? Defining monoallelic = VAF of 0.45 - 0.55, biallelic = VAF of 0.9 - 1.0 and neither as all other VAFs?

Thank you for any feedback!

