I have a few cancer samples that were analyzed using GATK germline pipeline (call SNVs of each sample, not the cohort study setting). Recently we got the corresponding normal samples sequenced, and I did GATK on them as well.
I obtained one sets of somatic calls by subtracting the germline cancer calls from corresponding normal calls. And then, I did Strelka on each cancer and normal pairs. Finally for each pair, I compared the strelka somatic calls, to the subtracted germline results of germline calls.
To my surprise - they are very different. Only 20%-40% positions matches, depending on different samples. To my knowledge, a match of 75%+ is expected. The level of inconsistency makes me hesitate to move further in this project. Any thought on this? (Default settings were used for all callings, my samples are all covered 30X+)
[A little bit detail about how I did the subtraction, in case it's relevant: I know unlike gVCF, normal VCF do not record positions that are not sequenced well, so I ignored the mismatched positions (very small portion anyway) from the two germline VCF files, and only looked at the change of heterogeneity at each position.]