I'm currently testing GATK's tumor-only variant calling pipeline and came across an issue that I'm stuck on. I wanted to see how different the results would be if I ran Mutect2 with and without matched normal by comparing the two outputs (command used available below) using TCGA data. I'm following the current GATK workflow in order to filter the Mutect2 (i.e. FilterMutectCalls) calls and only consider variants with 'PASS' for comparison.
For a given sample, its results were summarized in venn diagram:
I assume Mutect2 is trying to do its best without paired normal sample to reduce errors in its calls, but the small overlap between the results really concern me as I prepare to perform tumor-only calling on my samples. Does anyone have a suggestion on how to impose more stringent filtering (i.e. retrieve variants that fall in both matched and unmatched workflow from the venn diagram)?
I seeked for assistance on GATK forums, but found out they only handle errors and issues with their tools.
Thanks in advance.
# Mutect2 w/ matching normal gatk Mutect2 -R hg38.fa \ -I input_tumor.bam -I input_normal.bam \ -tumor tumor_sample -normal normal_sample \ -pon gatk4_mutect2_4136_pon.vcf.gz --germline-resource af-only-gnomad.hg38.vcf.gz \ --af-of-alleles-not-in-resource 0.0000025 \ -L exome_autoXYM.intervals -O mt2_matched.vcf.gz # Mutect2 w/ no matching normal gatk Mutect2 -R hg38.fa \ -I input_tumor.bam \ -pon gatk4_mutect2_4136_pon.vcf.gz \ --germline-resource af-only-gnomad.hg38.vcf.gz \ --af-of-alleles-not-in-resource 0.0000025 \ --genotype-germline-sites \ -L exome_autoXYM.intervals \ -O mt2_unmatched.vcf.gz