Query Regarding Missing Sample Paths in Graph-Based Pangenome Construction
1
0
Entering edit mode
4 months ago
s_135 ▴ 10

I constructed a graph-based pangenome using a reference FASTA file and a VCF file containing 150 individuals. I used the vg autoindex command with the following script:

sudo chmod -R 777 /tmp

VG=$(which vg)  # or VG=/usr/bin/vg, or unalias vg && VG=$(command -v vg)

fa="ref.fa"

vcf="samples_all.vcf.gz"

prefix="pangenome_graph"

date

echo "Autoindexing the graph..."

$VG autoindex --workflow giraffe \
    --prefix "$prefix" \
    --ref-fasta "$fa" \
    --vcf "$vcf" \
    --threads 4 \

    #-R XG  # Generate an XG index alongside GBZ

echo "Completed autoindexing the graph."

date

However, when I run the following command:

vg gbwt -L -H -Z pangenome_graph.giraffe.gbz

I observed only 17 haplotypes, whereas I expected paths for all 150 individuals. Additionally, I do not see all sample paths in the .gbz and .xg files.

Could you confirm whether these files are suitable for:

  1. Short-read mapping using vg giraffe
  2. Structural variant calling using vg call

I have attached a screenshot for reference.

enter image description here

Looking forward to your guidance.

Thank you

variant-calling pangenome giraffe vg haplotypes • 453 views
ADD COMMENT
2
Entering edit mode
4 months ago
Jouni Sirén ▴ 680

You have a GBZ graph based on a path cover of the graph built from the reference and the variants in the VCF file. For one reason or another, vg autoindex determined that your VCF file does not contain phased haplotypes. The log from the vg autoindex run could contain more information. Because Giraffe needs haplotypes, and because a GBZ graph only contains the subgraph covered by paths, vg autoindex proceeded to cover the graph with arbitrary paths.

A path cover graph is sufficient for Giraffe and downstream analysis, but actual haplotypes would improve the accuracy.

ADD COMMENT

Login before adding your answer.

Traffic: 2185 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6