Question: Difference Between Somatic And Germline Variant Calling?
gravatar for William
7.5 years ago by
William4.7k wrote:

Can someone explain to me what in theory and in practice the differences are between somatic and germline variant calling? Or point me to some papers that explain the difference.

I am used to calling variants on multiple individuals from a species using GATK. Apparently you can't just do multisample GATK variant calling for somatic variant calling on multiple samples (not just tumor normal) . Why not?

variant-calling somatic gatk • 22k views
ADD COMMENTlink modified 5.8 years ago by Biostar ♦♦ 20 • written 7.5 years ago by William4.7k
gravatar for Chris Miller
7.5 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

To rehash/expand on what Dan said, if you're sequencing normal tissue, you generally expect to see single-nucleotide variant sites fall into one of three bins: 0%, 50%, or 100%, depending on whether they're heterozygous or homozygous.

With tumors, you have to deal with a whole host of other factors:

  1. Normal admixture in the tumor sample: lowers variant allele fraction (VAF)
  2. Tumor admixture in the normal - this occurs when adjacent normals are used, or in hematological cancers, when there is some blood in the skin normal sample
  3. Subclonal variants, which may occur in any fraction of the cells, meaning that your het-site VAF might be anywhere from 50% down to sub-1%, depending on the tumor's clonal architecture and the sensitivity of your method
  4. Copy number variants, cn-neutral loss of heterozygosity, or ploidy changes, all of which again shift the expected distribution of variant fractions

These, and other factors, make calling somatic variants difficult and still an area that is being heavily researched. If someone tells you that somatic variant calling is a solved problem, they probably have never tried to call somatic variants.

ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by Chris Miller21k

Sounds like somatic / tumor variant calling is something that will be solved by improvements at the wet lab side ( single cell selection / amplification / sequencing ) . Rather than at the computational side.

ADD REPLYlink written 7.5 years ago by William4.7k

Well, single cell has a role to play (and would have more of one if WGA wasn't so lossy), but realistically, you can't sequence billions of cells from a tumor individually. Bulk sequencing still is going to have a role for quite a while.

ADD REPLYlink written 7.5 years ago by Chris Miller21k

Hell germ line calling isn't even a solved problem. Still get lots of false positives (and false negatives). It just tends to work so well that it is hard to improve it much except by making it faster, less memory intensive, etc

ADD REPLYlink written 7.5 years ago by DG7.2k

Solved was the wrong word. I just meant improved. There is only so much you can do at the computational side. Wet lab also has its part to play.

ADD REPLYlink written 7.5 years ago by William4.7k

Sorry, my comment wasn't a knock against what you posted at all. Just reiterating that for all of the vast improvements made for germ line calling it is still a difficult problem with lots of improvement to be made, and somatic variant calling is even tougher. Your post was excellent.

ADD REPLYlink written 7.5 years ago by DG7.2k

What do you mean by 'three bins: 0%, 50%, or 100%'? Thanks

ADD REPLYlink written 2.9 years ago by checkyodna30

Either you're going to have 0/2 copies, 1/2 copies, or 2/2 copies of that allele.

ADD REPLYlink written 2.9 years ago by Chris Miller21k
gravatar for DG
7.5 years ago by
DG7.2k wrote:

A germline variant caller generally has a ploidy-based genotyping algorithm built in to part of the algorithm/pipeline. I believe, IIRC, the GATK UnifiedGenotyper for instance does both variant calling and then genotype calling. So to call a genotype for a variant it is expecting a certain number of reads to support the alternative allele. When working with somatic variants all of the assumptions about how many reads you expect with a variant at a position to distinguish between true and false positives are no longer valid. Except for fixed mutations throughout the tumor population only some proportion of cells will hold a somatic variation. You also typically have some contamination from normal non-cancerous cells. Add in complications from significant genomic instability with lots of copy number variations and such and you have a need for a major change in your model for calling variation while minimizing artifactual calls. So you have a host of other programs that have been developed specifically for looking at somatic variation in tumor samples.

ADD COMMENTlink written 7.5 years ago by DG7.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1509 users visited in the last hour