2
0
Entering edit mode
2.5 years ago
cmirchan ▴ 10

Hello vg-team,

I have a graph that I created and indexed using:

vg construct -v vars -r ref -a >graph.vg
vg index -x graph.xg graph.vg
vg index -G graph.gbwt -v vars graph.vg


The VCF used for construction has phased genotypes for all 7 chromosomes, so I would expect 14 haplotype threads. However vg paths reveals many more than that, 945.

 vg paths -g graph.gbwt -x graph.xg -E
...


I see there are two 'main' threads:

_thread_sample_contig_0_x


What are the other threads? And what does the 'x' represent? Are they just parts of the collective thread?

vgteam vg • 655 views
0
Entering edit mode
2.5 years ago
glenn.hickey ▴ 250

Ambiguities, conflicts or missing data in the phasing information in the VCF will cause the haplotype threads to be broken up. Adding the -P option to your index -G command to force phasing at unphased genotypes may resolve this.

0
Entering edit mode

I remade my index with the -P option, but still resulted with 945 paths. Is there anything else I could try?

0
Entering edit mode

Sometimes haplotypes contain alternate alleles of overlapping variants that make no sense together (under the vg interpretation of the VCF). By default, this causes a phase break in GBWT construction. With option -o, the construction will use the reference allele for the variant that occurs later in the file in such cases. Together with -P, this option will guarantee haplotype paths spanning the entire contig. However, in some cases the paths will end up using edges that do not exist in the graph.

0
Entering edit mode
3 months ago

Hello, i'm letting this answer here for people in futur who may have the same problem.

i solved it by adding '' --discard-overlaps --force-phasing " arguments to the GBWT construction as i had unphased VCF file (Documentation here)

The vg paths then showed 20 haplotypes for my 10 samples