I have been developing a computational pipeline with VG Toolkit for human WGS analysis. I have built a genome graph with GRCH38 and variants from the 1000 Genomes project that acts as a reference for read mapping (using vg map) and variant calling (using vg call).
VG Toolkit calls relatively more variants in human WGS than other variant callers. The QC metrics (like No. of Transitions/Transversions ratio for SNPs) of the variants called by VG do not match with the expected values.
The VG documentation suggests some best practices like filtering out mappings and bases with quality < 5 before augmenting the graph. Increasing this threshold didn't have a substantial effect on the QC metrics of the variants called.
After filtering out variants with low quality (Phred score in the VCF file), the QC metrics approach the expected values. I found that a quality cut-off of 100 makes the variants amount to good QC metrics. But at this cut-off, nearly half the initially called variants get filtered out.
Are there any best practices for parameter tuning for human WGS analysis with vg? Is there a recommended quality cut-off for the variants called with vg call command? Or are there any recommendations for a better method of filtering out variants of good quality from all variants called by VG Toolkit?