Question: somatic calls by somatic SNV caller differ a lot, when comparing with cancer minors normal germline calls
0
gravatar for DVA
19 months ago by
DVA500
United States
DVA500 wrote:

I have a few cancer samples that were analyzed using GATK germline pipeline (call SNVs of each sample, not the cohort study setting). Recently we got the corresponding normal samples sequenced, and I did GATK on them as well.

I obtained one sets of somatic calls by subtracting the germline cancer calls from corresponding normal calls. And then, I did Strelka on each cancer and normal pairs. Finally for each pair, I compared the strelka somatic calls, to the subtracted germline results of germline calls.

To my surprise - they are very different. Only 20%-40% positions matches, depending on different samples. To my knowledge, a match of 75%+ is expected. The level of inconsistency makes me hesitate to move further in this project. Any thought on this? (Default settings were used for all callings, my samples are all covered 30X+)

[A little bit detail about how I did the subtraction, in case it's relevant: I know unlike gVCF, normal VCF do not record positions that are not sequenced well, so I ignored the mismatched positions (very small portion anyway) from the two germline VCF files, and only looked at the change of heterogeneity at each position.]

snp snv somatic • 1.0k views
ADD COMMENTlink modified 19 months ago by d-cameron2.0k • written 19 months ago by DVA500
1
gravatar for szilveszter.juhos
19 months ago by
scilifelab.se
szilveszter.juhos10 wrote:

Hi, AFAIK running a germline caller (i.e. HaplotypeCaller) on both the normal and the tumour sample and subtracting the calls to get somatic variants is suboptimal: these calls usually have low allele frequencies, so you need a somatic caller like MuTect2, Strelka, whatever. We are using the germline caller only for QC, to be sure we not mixing up matching samples (and to get germline variants of course). If you have tissue samples with high heterogeneity (high percent of normal cells in the tumour tissue, or multiple clones) it is not surprising to get a low concordance compared to a somatic caller.

ADD COMMENTlink written 19 months ago by szilveszter.juhos10

Thank you for the reply. By low allele freq, do you mean that the one of the allele is only supported by limited number of reads, but somehow the germline caller captured it? Of curiosity, have you done such comparison using your data?

ADD REPLYlink written 19 months ago by DVA500
1
gravatar for d-cameron
19 months ago by
d-cameron2.0k
Australia
d-cameron2.0k wrote:

Every caller will only call variant above a certain threshold. If you do independent variant calling on your germline and tumour then a (possibly large) portion of your germline variants will be in near that threshold and just happen to be below in one and above in the other. This will result in a number oof false positive somatic/somatic LOH calls.

By joint calling using a somatic caller, your effective germline coverage is the combined coverage (i.e 60x), and your somatic call set will contain fewer germline variants that are incorrectly called as somatic.

Purity, anuploidy, and sub-clonality all effect the allele frequency of the somatic calls such that, unlike germline calls where a AF of 0, 0.5 or 1 is expected, somatic variant allele frequencies can take a range of values. As such, it is not surprising that a germline callers such as GATK which genotype variants using a diploid model, does not perform well on somatic samples.

ADD COMMENTlink modified 19 months ago • written 19 months ago by d-cameron2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1563 users visited in the last hour