I constructed a graph-based pangenome using a reference FASTA file and a VCF file containing 150 individuals. I used the vg autoindex command with the following script:
sudo chmod -R 777 /tmp
VG=$(which vg) # or VG=/usr/bin/vg, or unalias vg && VG=$(command -v vg)
fa="ref.fa"
vcf="samples_all.vcf.gz"
prefix="pangenome_graph"
date
echo "Autoindexing the graph..."
$VG autoindex --workflow giraffe \
--prefix "$prefix" \
--ref-fasta "$fa" \
--vcf "$vcf" \
--threads 4 \
#-R XG # Generate an XG index alongside GBZ
echo "Completed autoindexing the graph."
date
However, when I run the following command:
vg gbwt -L -H -Z pangenome_graph.giraffe.gbz
I observed only 17 haplotypes, whereas I expected paths for all 150 individuals. Additionally, I do not see all sample paths in the .gbz and .xg files.
Could you confirm whether these files are suitable for:
- Short-read mapping using vg giraffe
- Structural variant calling using vg call
I have attached a screenshot for reference.
Looking forward to your guidance.
Thank you