I have a graph that I created from a group of haploid assemblies using PGGB. What I would like to do is use VG to align short reads from new samples to the graph using giraffe, and then genotype the samples. My end goal is to have the genotype calls be consistent with the haplotypes used to construct the graph (originally existed as W-lines in the PGGB GFA, and later as GBWT threads). I have included below the workflow that I thought would work which constructs the graph from GFA, uses
vg deconstruct to create a VCF with the genotypes of all the GBWT threads, then uses
vg pack and
vg call to genotype a
.gam based on that VCF. Unfortunately, the final command using
vg call gives me the error
[VCFTraversalFinder] Warning: No alt path (prefix=_alt_6162647cdc0e686937be7380951cff5969dced64_) found in graph for variant. It will be ignored:
for every variant. From the documentation I can see that this is caused by the fact that the graph was not constructed from a FASTA + VCF with the
-a flag, so there are no alt paths in the graph.
As solutions, I attempted to use
vg call with the
-g flags, which both worked to genotype the snarls in the graph. However, different samples produced different numbers of calls, and the POS/REF/ALT determination at each snarl was not completely consistent with the VCF created using
vg deconstruct making it difficult to harmonize the calls.
Is there any workflow that I could use to provide a VCF to
vg call to allow for harmonized calls, given that I started with a graph constructed from GFA? Such as some way to turn the GBWT threads into alt paths?
#Starting from PGGB GFA output #Reference path stays as a P-line #P-lines for haplotypes modified to match (sample)_(contig)_(haplotype) for conversion to W-lines vg convert \ -g graph_plines.gfa\ -f \ -w _ \ -t 12 > graph.gfa #Create indexes from GFA vg autoindex \ -w giraffe \ -g graph.gfa \ -t 12 \ -V 1 \ -T /data/vg_temp #Extract GBWT and GBWTGraph from GBZ for later user with deconstruct vg gbwt \ -o graph.gbwt \ -g graph.gg \ -Z graph.giraffe.gbz #Create XG from GBZ for later use with deconstruct/pack/call vg convert \ -Z graph.giraffe.gbz \ -x > graph.xg #Create a VCF containing genotypes of all haplotypes vg deconstruct graph.xg \ -p PGF \ -g graph.gbwt \ -a \ -t 12 > graph.vcf #Align short reads using giraffe vg giraffe \ -Z graph.giraffe.gbz \ -m graph.min \ -d graph.dist \ -p \ -f sample.1.fq \ -f sample.2.fq > sample.gam #Create packed read support, ignoring reads with MAPQ < 5 (-Q 5) vg pack \ -x graph.xg \ -o graph.pack \ -g sample.gam \ -Q 5 #Genotype variants in graph VCF vg call graph.xg \ -p PGF \ -k sample.pack \ -v graph.vcf \ -s sample \ -t 12 > sample.vcf