I want to use the "export theta" functionality of CNVKit estimate the tumor purity based on THETA2 program.
Part of the input for THETA2 are files with SNP counts for the Tumor and Normal samples formatted like:
#Chrm Pos Ref_Allele Mut_Allele 10 104427 74 1 10 111955 54 0 10 135656 0 94
To my best understanding these should be the germline variants in the Tumor and Normal samples, because they are used to estimate the biallelic fraction (BAF).
Based on the CNVkit manual
cnvkit export theta accepts a vcf file:
cnvkit.py export theta Sample_T.cns reference.cnn -v Sample_Paired.vcf
However it is unclear what kind of VCF is it. If the germline mutations are important then the VCF output of programs such as MuTect2 are not appropriate, because they are geared towards somatic mutations and discard of the germline mutations. Should I use the output of HaplotypeCaller? But then how is the Sample_Paired.vcf organised? And furthermore, should I filter the VCF to include only PASS mutations?
Am I missing out on something?