I am trying to understand the factors that could lead to erroneous identification of high burden mutation without controlling ethnicity and how it needs to be addressed. What other QC metrics need to be taken care of while studying driver mutations in cancer from "next generation sequencing data"?
Really this is two questions. As ethnicity and rare variant analysis and studying driver mutations in cancer are quite different from one another. Ethnicity and rare variant analysis is a very broad topic, that does impact on things like cancer driver mutation studies of course, as it really impacts all genomics. In your case, for burden testing, the main thing is that we filter out common polymorphisms and less common but not 'rare' variants from our genomic dataset. This is primarily to remove germline variants (when we didn't do matched normal germline sequencing along with our tumour). Even in the cases where we did do matched normal sequencing, we would be suspicious of a putative somatic mutation that was a known polymorphism in normal control populations. Either it wasn't called (but should have been) in the normal tissue (therefore a false positive as a somatic variant) OR it is a somatically acquired mutation but given its status as a polymorphism in normal populations, it is unlikely to be pathogenic and is just one of the many passenger somatic mutations tumour cells acquire. So it is extremely unlikely to be a driver mutation.
In cancer genomics, and even when studying Mendelian diseases and the like, we always want to try and have a representative control population for the population we are studying, but we also typically filter for rare mutations by comparing to all studied populations, with some caveats at times. Typically if we see a mutation that looks rare or novel in our population, but it is common in another ethnic population, we would still consider it 'not rare' and filter it from our results as it is unlikely to be pathogenic. When you get into studying common diseases, highly polygenic diseases, etc ethnic matching and allele frequency control becomes much more involved.
ADD COMMENT
• link
updated 5.5 years ago by
Ram
45k
•
written 9.4 years ago by
DG
7.3k
0
Entering edit mode
Thank you so much, Dan. This exactly what I need to know. Could please elaborate how this burden test is different from GWAS which looks for rare variant with small effect in germline samples?
ADD REPLY
• link
updated 5.5 years ago by
Ram
45k
•
written 9.4 years ago by
MAPK
★
2.1k
1
Entering edit mode
Burden tests are typically on a gene-wise basis, so looking for genes with a higher proportion of somatic mutations than what you expect based on chance.
I would recommend reading as many papers as possible to see what sorts of QC and methods they are doing. The best way to learn these things is to immerse yourself in the literature and see what the current standards are in the field. Are you working with tumour-only sequencing data?
ADD REPLY
• link
updated 5.5 years ago by
Ram
45k
•
written 9.4 years ago by
DG
7.3k
0
Entering edit mode
Thank you, Dan.
ADD REPLY
• link
updated 5.5 years ago by
Ram
45k
•
written 9.4 years ago by
MAPK
★
2.1k
Thank you so much, Dan. This exactly what I need to know. Could please elaborate how this burden test is different from GWAS which looks for rare variant with small effect in germline samples?
Burden tests are typically on a gene-wise basis, so looking for genes with a higher proportion of somatic mutations than what you expect based on chance.
I would recommend reading as many papers as possible to see what sorts of QC and methods they are doing. The best way to learn these things is to immerse yourself in the literature and see what the current standards are in the field. Are you working with tumour-only sequencing data?
Thank you, Dan.