I have been trying to use Vg call to call variants from pooled samples. I am interested in using it for indels but I first compared the vg call frequency for two biaalelic SNPs whose frequency was also estimated using a different method (PoolSNP). The estimates from PoolSNP seem quite reliable but only for biallelic SNPs. Vg call produces similar AF for the variants it has called, see plots below. However, it does not call the variant in most samples, again in plot. The minimum sample depth in the PoolSNP calls for both sites is >40. My feeling it has something to do with the fact that Vg call work on samples individually without joint calling. I also attached my approach below, I am using vg-v1.26.1.
vg map -x x.xg -g x.gcsa -f 1.fq.gz -f 2.fq.gz -t 32 -Z 100 > mapped.gam vg gamsort mapped.gam -t 32 -i mapped_sorted.gam.gai >mapped_sorted.gam vg chunk -x x.xg -c 10 -p 3:3716672-3718990 -g -a mapped_sorted.gam -O pg >chunk.pg 2>chunk_err vg augment chunk.pg chunk_0_3_3716352_3719336.gam -s -A chunk_1_aug.gam > chunk_1_aug.pg vg snarls chunk_1_aug.pg > chunk_1_aug.snarls vg pack -x chunk_1_aug.pg -g chunk_1_aug.gam -o chunk_1_aug.pack vg call chunk_1_aug.pg -a -r chunk_1_aug.snarls -k chunk_1_aug.pack -s sample > sample_calls.vcf
I only show for two SNPs but this happens for many others as well and it never the same samples that are called by Vg call. Please let me know if you have any insights into what is going on. Thanks so much for your time, Arun.