Question: Doesn't Base Quality Score Recalibration degrade sensitivity in heavily mutated cancers?
gravatar for Cyriac Kandoth
6.4 years ago by
Cyriac Kandoth5.5k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.5k wrote:

From the GATK docs - [BQSR] assumes that all reference mismatches are errors and indicative of poor base quality - which is why we have to give it a list of dbSNPs to skip over. But what about somatic SNPs? Wouldn't hypermutated tumors from uterine, colorectal, melanoma, or lung cancers be re-calibrated to a lower quality than data from AML or breast? And variant caller sensitivity would drop accordingly. Or is this not a big deal, in practice?

As a test - I will try to do some high-confidence SNP calling on un-calibrated uterine cancer BAMs, append those to the dbSNP VCF for BQSR, and redo variant calling. Then I'll compare these calls to the standard BQSR BAM using only dbSNP.

sequencing snp • 5.3k views
ADD COMMENTlink modified 4.5 years ago by Biostar ♦♦ 20 • written 6.4 years ago by Cyriac Kandoth5.5k

Looking ahead to the results for your test. I have observed a similar problem in regards with finding novel variants. The Base Quality Score Recalibration will probably decrease the chances of a variant caller to detect novel snps.

ADD REPLYlink written 6.4 years ago by kautilya410
gravatar for lh3
6.3 years ago by
United States
lh332k wrote:

No, BQSR won't affect sensitivity. The list of dbSNP sites are used to generate the recalibration table. A more complete list of variants helps to yield higher calibrated quality. It is not like that a variant seen in the dbSNP will get a higher quality than a novel variant.

That said, Illumina raw quality is pretty good these days. For high-coverage samples, BQSR is frequently not necessary IMO.

ADD COMMENTlink written 6.3 years ago by lh332k

BQSR is not needed since Illumina base qualities are quite good nowadays. But it is still a standard part of many reference alignment pipelines... because it's recommended by GATK's best-practices for DNA-seq. And it inevitably gets used in cancer genomics pipelines, where there are real variants at various allele fractions. And since these variants are not in dbSNP, they will be classified as sequencing artifacts that are used to generate the recalibration table - that inevitably "corrects for" these real variants - reducing sensitivity.

I still haven't done my test, so can't confirm this assumption yet. But does it make sense?

ADD REPLYlink modified 6.2 years ago • written 6.3 years ago by Cyriac Kandoth5.5k

No, no. The recalibration table WON'T "correct for these real variants".

ADD REPLYlink written 6.3 years ago by lh332k

OK. Then I might have misunderstood the purpose of BQSR. Can you take a look at their docs here, or here's a shortened excerpt:

[BQSR] tabulates and bins data about features of the bases (read group, dinucleotide context, etc.). It counts the number of bases within each bin and how often such bases mismatch the reference base, excluding loci known to vary in the population (dbSNP). The new recalibrated quality scores are based on the sum of the global difference between reported quality scores and the empirical quality.

In cancers, there will be a lot of real variants in very distinctive dinucleotide contexts. For example, from UV and cigarette smoke. Or when specific DNA-repair genes (like POLE, MLH1, etc.) are disabled, you get more variants of the type that they were responsible to repair. All these are real variants in reads that BQSR will downgrade due to too much difference from the reference sequence (empirical quality).

ADD REPLYlink modified 6.2 years ago • written 6.3 years ago by Cyriac Kandoth5.5k
gravatar for Cyriac Kandoth
6.4 years ago by
Cyriac Kandoth5.5k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.5k wrote:

The official answer from GATK devs is down here. In short, this was "...not investigated, but it sounds like a use case where the BQSR would benefit from generating a bootstrap set of variants". They go on to give a longer explanation of the test I mentioned in the question, and that "This should compensate for the risk of counting real mutations as errors in hypermutated cancer tissue. But please understand that it's a theoretical solution that we haven't tested out ourselves, so we can't guarantee results"

Update (Jan 24, 2019): GATK devs provided this comment that acknowledge the issue, but also make a good case that the degradation in sensitivity should be negligible.

ADD COMMENTlink modified 21 months ago • written 6.4 years ago by Cyriac Kandoth5.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1089 users visited in the last hour