Question regarding germline variant calling from exome :
I've a dataset composed of ~500 exome samples build using 6 differents kits (the dataset was build since ~6-7 years, so as kits evolved the "youngest" samples where build using the more up-to-date kits and the oldest samples with the oldest kits.
As the targeted genomic regions are different for each exome kits which interval file should I use in GATK best practice ( BWA -> MarkDup -> BQSR -> HaplotyeCaller -> GenomicsDBimport -> GenotypeVCF -> VariantRecalibrator ) ? My first idea will be to use the union of all interval files (from each exon kit) but I'm wondering if VQSR part of GATK pipeline will not struggle as all samples will not fit the "union" interval set.
Other idea : for each kit call the variants using the associated samples and interval file. Merge the VCFs after VQSR filtering.
Any advice ? Thanks
I open a thread on GATK's forum as it's really specific to this tool : https://gatkforums.broadinstitute.org/gatk/discussion/24168/strange-tranche-plot-after-gatk4-germline-snps-pipeline
In a nutshell, I succeed to improve TiTv by running each sample with it's respective interval file (from the corresponging exome kit) ; then used the union of these intervals for steps after Haplotyecaller (joint genotyping and VQSR)