Hi all,
We want to find the snp and indels variation from the result vcf file BS_graph_call.vcf by using the pan_genome vg analysis software.
There are only fewer than 20 snp and indels variational lines were found in the result vcf file BS_graph_call.vcf. But when we call snp and indels information base on the reference Ah.genome.fa and query BS.genome.fa by using Mummer software, there were approximately 500,000 highly reliable variant sites were identified.
We want to find the reason that why only a very few snp and indels variation were identified from the final result vcf file BS_graph_call.vcf base on the vg software.
The detail command line were shown belown:
Only PAV( Presence-absence Variation) were saved on the input PAV.vcf.gz file.
vg autoindex --workflow giraffe -r Ah.genome.fa -v PAV.vcf.gz -p Ah -t 100
vg giraffe -Z Ah.giraffe.gbz -m Ah.min -d Ah.dist -f BS_1.fq.gz -f BS_2.fq.gz -t 4 > BS.giraffe.gam
vg pack -x Ah.giraffe.gbz -g BS.giraffe.gam -Q 5 -o BS.pack -t 4
vg call -t 4 Ah.giraffe.gbz -k BS.pack > BS_graph_call.vcf
vg version v1.48.0 "Gallipoli"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Linked against libstd++ 20210601
Built by ubuntu@ip-172-31-9-38