Hi! so sorry if this is a very basic question, but is something that I haven't be able to solve myself.
I am trying to identify novel genes related to a kind of DNA damage repair deficiency (HRD). To do that, I'm using pancancer TCGA data, and need to discard already known genes related to this dna repair deficiency. My problem is most of the already known genes are tumour supressor genes, that means that they have to be biallelicly affected to confer deficiency, so I need to demonstrate that.
How can I effectively identify if a determined gene have both alleles mutated in a given sample?
I have to clarify that I have no access to the controlled data of TCGA project, and I can use R and bash.
Thanks in advance!
I'd like to know this as well!! cBioPortal says this in their FAQ:
Is it possible to determine if a particular mutation is heterozygous or homozygous in a sample? When a sample has 2 mutations in one gene, is it possible to determine whether the mutations are in cis or in trans with each other?
There is currently no way to definitively determine whether a mutation is heterozygous/homozygous or in cis/trans with another mutation. However, you can try to infer the status of mutations by noting the copy number status of the gene and the variant allele frequency of the mutation(s) of interest relative to other mutations in the same sample. The cBioPortal patient/sample view can help you accomplish this.
Specifically in the case of TCGA samples with two mutations in the same gene, you can also obtain access to the aligned sequencing reads from the GDC and check if the mutations are in cis or in trans (if the mutations are close enough to each other).
I found this to be a helpful answer. If I understand correctly, they're saying that if a patient is diploid for that given gene and it is mutated with a high VAF (I'm guessing +90%), then the mutation is homozygous. If the VAF is low (around 50%) then it's likely a heterozygous mutation. However, VAF is affected by tumor purity, so if a patient's sample seems impure like in this case then a VAF of 60% for a diploid gene (like TP53 in their example) is enough to call it a homozygous mutation.
One way I'd do this is to query a specific gene into the PanCancer project you're interested in (e.g. TP53 in BRCA) and maybe select those combinations of "Copy Number" and "Allele Frequency" where it's clear that the mutation is homozygous. The problem with this is that you'd have to check the tumor purity for each sample (which cBioPortal doesn't show) or at least check the allele frequency distribution for each sample... which means you have to do it sample per sample.
Seems a bit convoluted for my taste, but if you found an easier way to do this please do tell!
Also keep this in mind:
The mutation allele frequency can be affected by:
Zygosity of the mutation: oftentimes mutations are heterozygous, ie. only 1 out of the 2 copies in the same cell is mutated.
Clonality: a mutation may be subclonal, ie. not all tumor cells carry the mutation.
Ploity: copy numbers may be changed in the tumor so that not exact 2 copies exist.
Purity of the sample: there may be non-malignant cells in the sample
For a pure tumor (100% tumor cells), a clonal mutation (occurring in all tumor cells) in a diploid gene (two copied) has an allele frequency of 50% if it is heterozygous (occurring in 1 copy) and 100% if it's homozygous (occurring in both copies).
There are several possibilities of a 70% allele frequency. For example, it is possible that 70% of the sample are tumor cells and all tumor cells carry the homozygous mutation. It is also possible that the sample is a pure tumor, but there are 3 copies of the gene and 2 of them carry the mutation. And there are other possibilities. You would need to look at the data in more details in order to decide which one is likely the case.