I have question about filtering NGS data with GATK workflow. I have data from NGS Illumina (Sureselect kit), it is targeted gene panel (about 112 genes), usually we are using two softwares - Surecall and Nextgene with default settings. Output of this analyses is VCF file with about 500 variants.
Now I am trying to implement GATK workflow from fastq to VCF. (this workflow: https://software.broadinstitute.org/gatk/best-practices/bp_3step.php?case=GermShortWGS)
Basically it is - bam created by bwa-mem, then I remove duplicates and do base recalibration. Then I use Haplotype Caller for creating gvcf and finally VCF.
Last step that I do, is hard filtering raw snps VCF with filters from GATK -
--filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"
Problem is, that after this step my vcf has about 40 000 variants passing the filter - do you guys have any idea, where I need to change parameters to get about same amount of variants as running by softwares like SureCall?
Thanks for any tips :)