How can I know if a gene mutation is somatic or germline?
2
0
Entering edit mode
12 weeks ago
Fares • 0

Hello,

I'm wondering how one can I know if a gene has a germline or somatic mutation from the data I have.

somatic mutation germline • 395 views
0
Entering edit mode

Search COSMIC for a matching entry.

4
Entering edit mode
12 weeks ago
LauferVA 4.2k

Arguments proceed in several different ways. I'll present AF first, but really I'd understand 1. through 5. as complementary heuristics some or all of which may be leveraged together to build a complete understanding.

1. Making an argument based on Allele Frequency: First, if your read depth is high enough without indication of strand or allelic bias, you can make an inference from the allele frequency itself. Generally, this will be subject to a few limiting considerations, for instance, if there is also copy number at the locus or germline mosaicism, arguments based on AF may fail. Otherwise, lets say you have high depth sequencing and the variant allele fraction is 0.3. This is a strong indicator of a somatic variant; if the variant were germline, its AF would be 0, 0.5, or 1 (0.5 for autosomes and the X chromosome if female).

Quantitating using the Binomial test:

Null Hypothesis (H0): The variant is germline, hence the expected AF is 0.5 for an autosomal or X-chromosome variant in females (assuming a diploid genome and no copy number variations). Alternative Hypothesis (H1): The variant is somatic, and the AF can significantly deviate from 0.5.

Total number of trials (n): Total read depth at the variant locus. Observed success (k): Number of reads supporting the variant allele. Probability under H0 (p): 0.5 (expected AF for a germline variant).

Assuming you have a total read depth (n) and you know the number of variant reads (k), you can calculate the p-value using the binomial test formula. The p-value will indicate the probability of observing a number of variant reads as extreme as k under the null hypothesis, given by (1):

p = n choose k * (p^k)(1-p)^n-k (1)

In this case, since the observed AF is 0.3, and assuming a high read depth (say, n = 1000), you would calculate the p-value for observing 300 or more variant reads out of 1000 by chance, if the true AF were 0.5. This can be implemented easily in python, R, etc.:

from scipy.stats import binom_test
p_value = binom_test(300, 1000, 0.5, alternative='greater')


in R:

result <- binom.test(k, n, p = 0.5, alternative = "greater") # note: binom.test is in base R
print(result)


As alluded to above, relying on 1. (logic relating to allele frequency alone) is not sufficient in every case. As such, in various scenarios, additional tests of various kinds are used, and depending on the scenario may increase confidence that such an inference based on AF is correct.

2. Perform Tumor Normal sequencing. By sequencing a tumor as well as healthy cells from a patient, the allele frequency of a variant in the tumor and healthy cells can be compared. Interpreting such data can be tricky, though. A result such as "the variant has the same frequency in both tumor and normal, and both are near 0.5" would suggest the variant is germline, while a result such as "the variant is absent from the patient's normal cells, but is found at a high allele fraction (AF) in the tumor cells" constitutes a suggestion that the variant arose somatically. Aside from presence absence, T-N sequencing can also help resolve issues relating to copy number alteration that has also arisen somatically at the same locus. However, there are a lot of edge cases here that aren't as straight forward to interpret.

3. Dealing with mosaicism. A variant may be neither germline nor somatic, but arise in a germ cell, for instance, post-zygotically. In this scenario, the variant might be found in many of a person's cells, but not all of them, and depending on which cell it arises in and when, may or may not form a part of that person's (here, the proband, the fetus in the F1 generation) germline. In such cases, various steps are taken. Clinically, it is common to take cells from tissues of different embryological origins; in addition to a tissue of interest, or blood, skin biopsy is presently most commonly used, although doing this as a stand-alone measure has recently impugned, e.g. by Lupski et al.. In such a case, knowing the variant is actually a mosaic, de novo variant could prevent the mistaken inference that a variant was somatic and arose in tumorigenesis, or even as a driver or tumorigenesis.

4. Combining AF info. with other heuristics. As Ram suggested, presence of the variant of interest in germline databases, somatic databases, both, neither etc. can help you lean one way or another. E.g. a variant with an appreciable population frequency is assumed unlikely to have arisen de novo or somatically in the absence of additional information suggesting that (see 5., below). In addition, as also suggested above, if the database containing the record also confirms a phenotype that is consistent (or identical, better yet) with the phenotype seen in the proband, this is helpful as well, though that may or may not provide insight into whether that variant is germline or somatic...

5. Trio sequencing Finally, trio sequencing (or genotyping) can effectively disprove that a variant is of somatic origin. In addition, it is helpful, but not definitive, for proving that a variant is germline. In other words, trio sequencing alone doesn't distinguish between de novo versus somatic variants and is subject to phenomena such as clonal spermatogenesis as well. Despite this limitation, trio sequencing finds application in several important contexts. As an example, consider a proband with a cancer predisposition variant; trio sequencing could help to guide which of the proband's family members should also undergo testing for a variant, if any.