From the GATK docs - [BQSR] assumes that all reference mismatches are errors and indicative of poor base quality - which is why we have to give it a list of dbSNPs to skip over. But what about somatic SNPs? Wouldn't hypermutated tumors from uterine, colorectal, melanoma, or lung cancers be re-calibrated to a lower quality than data from AML or breast? And variant caller sensitivity would drop accordingly. Or is this not a big deal, in practice?
As a test - I will try to do some high-confidence SNP calling on un-calibrated uterine cancer BAMs, append those to the dbSNP VCF for BQSR, and redo variant calling. Then I'll compare these calls to the standard BQSR BAM using only dbSNP.