I'm trying to determine the heteroclonality of mutations in specific genes across histologies by using TCGA data. In essence, what percentage of mutations in Gene X are subclonal?
An imperfect analysis would be to take the variant allele frequency (VAF) for each patient, and define patients with a VAF < 0.5 as having "definite heteroclonality," the logic being that if the mutation is monoallelic in 100% of tumor cells the VAF should be 0.5.
The big caveat is that if a mutation is in fact bi-allelic then it will be a false negative. Another big caveat is that this analysis is assuming 100% tumor content, which I know is incorrect - TCGA requires >70% tumor content (so maybe a cutoff of 0.35 would be more appropriate). Which leads to my questions:
1) I'd like to account for tumor content for each TCGA patient (if possible), does anyone know if/where this data is available? I've been scouring cBioPortal and publications but can't find the tumor content for each TCGA patient (just the fact that it should always be >70%).
2) Is there a more appropriate cutoff for VAF to confidently call heteroclonality? Should I build in some sort of uncertainty in my cutoffs (I could imagine a VAF of 0.47 that's truly monoallelic, but is now a false positive because of variability in the measurements). Should I be requiring a
VAF < 0.5 * Tumor Content? Should I arbitrarily raise/lower my VAF cutoff by 5% to account for error in measurements?
3) Would it be more appropriate to bin VAFs into "monoallelic," "biallelic," and "neither" categories? Defining
monoallelic = VAF of 0.45 - 0.55,
biallelic = VAF of 0.9 - 1.0 and
neither as all other VAFs?
Thank you for any feedback!