Hi there,
I have a tumor and a normal BAM file and am preparing to run base recalibration.
I was planning on calling variants on the normal and using that, in addition to dbSNP, as input for recalibration of tumor BAM(s), e.g.:
gatk BaseRecalibrator \
-I tumor.bam \
-R hg38.fasta \
--known-sites normal.vcf \
--known-sites dbSNP_hg38.vcf \
-O tumor_recal.table
Before producing the normal VCF however, it's not clear to me whether I should run base recalibration on the normal BAM. If this is advised, I had planned using dbSNP as the known sites (for normal), e.g.:
gatk BaseRecalibrator \
-I normal.bam \
-R hg38.fasta \
--known-sites dbSNP_hg38.vcf \
-O normal_recal.table
Alternatively, I could keep things simple and run base recalibration on both tumor and normal using dbSNP only.
Is one of these workflows more preferable? Any clarity here would be much appreciated. Thanks!
Hi Cyriac, thanks so much for your response, this is very helpful. To be clear, doing BQSR with, for example, dbSNP, would not hurt the output of your analyses so much as the computational aspect is cumbersome, yes?
"hurt" is relative. :) Waiving the computational expense, doing BQSR gives you a decent balance between variant detection sensitivity and specificity. But if you care more about sensitivity than specificity, then BQSR will hurt your analysis. See more here.
You have all the answers! Thank you again.