Question: More haplotype threads than expected
0
gravatar for cmirchan
10 months ago by
cmirchan10
cmirchan10 wrote:

Hello vg-team,

I have a graph that I created and indexed using:

vg construct -v vars -r ref -a >graph.vg
vg index -x graph.xg graph.vg
vg index -G graph.gbwt -v vars graph.vg

The VCF used for construction has phased genotypes for all 7 chromosomes, so I would expect 14 haplotype threads. However vg paths reveals many more than that, 945.

 vg paths -g graph.gbwt -x graph.xg -E
_thread_ZI284_NC_004353.4_0_1   127100
_thread_ZI284_NC_004353.4_1_1   127104
_thread_ZI284_NC_004354.4_0_0   932781
_thread_ZI284_NC_004354.4_1_0   932778
_thread_ZI284_NC_004354.4_0_1   627525
_thread_ZI284_NC_004354.4_1_1   627553
_thread_ZI284_NC_004354.4_0_2   992875
_thread_ZI284_NC_004354.4_1_2   992884
_thread_ZI284_NC_004354.4_0_3   113038
_thread_ZI284_NC_004354.4_1_3   113036
_thread_ZI284_NC_004354.4_0_4   319932
_thread_ZI284_NC_004354.4_1_4   319953
_thread_ZI284_NC_004354.4_0_5   102680
_thread_ZI284_NC_004354.4_1_5   102686
_thread_ZI284_NC_004354.4_0_6   122150
_thread_ZI284_NC_004354.4_1_6   122160
_thread_ZI284_NC_004354.4_0_7   41509
_thread_ZI284_NC_004354.4_1_7   41514
_thread_ZI284_NC_004354.4_0_8   62633
_thread_ZI284_NC_004354.4_1_8   62637
_thread_ZI284_NC_004354.4_1_9   422021
_thread_ZI284_NC_004354.4_0_9   1177293
...

I see there are two 'main' threads:

_thread_sample_contig_0_x
_thread_sample_contig_1_x

What are the other threads? And what does the 'x' represent? Are they just parts of the collective thread?

Thanks, Cade

vgteam vg • 269 views
ADD COMMENTlink modified 10 months ago by glenn.hickey170 • written 10 months ago by cmirchan10
0
gravatar for glenn.hickey
10 months ago by
glenn.hickey170
glenn.hickey170 wrote:

Ambiguities, conflicts or missing data in the phasing information in the VCF will cause the haplotype threads to be broken up. Adding the -P option to your index -G command to force phasing at unphased genotypes may resolve this.

ADD COMMENTlink written 10 months ago by glenn.hickey170

I remade my index with the -P option, but still resulted with 945 paths. Is there anything else I could try?

ADD REPLYlink written 10 months ago by cmirchan10

Sometimes haplotypes contain alternate alleles of overlapping variants that make no sense together (under the vg interpretation of the VCF). By default, this causes a phase break in GBWT construction. With option -o, the construction will use the reference allele for the variant that occurs later in the file in such cases. Together with -P, this option will guarantee haplotype paths spanning the entire contig. However, in some cases the paths will end up using edges that do not exist in the graph.

ADD REPLYlink written 10 months ago by Jouni Sirén110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1121 users visited in the last hour