unable to find most of SVs in constructed graph.vg
3
1
Entering edit mode
10 months ago
Maxine ▴ 40

Hi vg team,

I followed the instructions provided in the Working with a whole genome variation graph to construct my own variation graph. After constructing the graph, I wanted to validate if my input.vcf file successfully passed all the structural variations (SVs) to the graph. My approach was to use the vg deconstruct command to generate a new VCF file called new.vcf, and then compare the two VCF files to check if they contained the same set of SVs, or at least a similar set.

However, when I compared the SVs for the chromosome "NC_058080.1_1" between the original input.vcf file and the new.vcf file, I noticed a significant difference. The input.vcf file contained 101,242 SVs for this chromosome, whereas the new.vcf file only had 441 SVs for the same chromosome.

I'm unsure at which step I might have made a mistake. To provide a clearer picture, I will list all the commands I used. Hopefully, this information will help identify any potential errors or issues in the process.

vg version: v1.48.0 "Gallipoli"

input vcf: input.vcf which contains unphased SVs

# graph construct
vg construct -f -S -a -t 1 -R NC_058080.1_1 -r  ref.fna -v input.vcf.gz > NC_058080.1_1.graph_div_12bufo.vg

# deconstruct
vg deconstruct -t 16 --verbose -a NC_058080.1_1.graph_div_12bufo.vg > new.vcf

Looking forward to your help. Thanks in advance!

Maxine

vg • 951 views
ADD COMMENT
2
Entering edit mode
10 months ago
glenn.hickey ▴ 520

vg deconstruct only works properly when haplotypes are available as paths in your graph.

If you use vg autoindex --workflow giraffe to construct your graph, it will produce a .gbz file with haplotypes embedded. Passing this .gbz to deconstruct should work as intended.

ADD COMMENT
0
Entering edit mode

Thank you for your response. I still have some questions regarding Giraffe. In my case, the input.vcf file is unphased and contains numerous structural variations (SVs), which means that overlaps between SVs are quite common. According to the guide Mapping short reads with Giraffe, using Giraffe for mapping might not be the most suitable approach. Now, my concern is whether this situation will impact the accuracy of vg deconstruct. If there could be potential issues, I'd like to know if there's a better way to test the quality and accuracy of the graph.vg file I constructed. Any guidance or suggestions would be greatly appreciated.

ADD REPLY
0
Entering edit mode
10 months ago
glenn.hickey ▴ 520

I think autoindex will invent a covering set of haplotypes if your VCF is unphased.

In general, unless you get warnings telling you otherwise, the graph you get from vg construct will be equivalent to the VCF you input to it.

ADD COMMENT
0
Entering edit mode

Certainly! Thank you for your assistance! My final question is related to my unphased VCF data: which mapper would be the most suitable in this case? I'm particularly concerned about whether vg giraffe is suitable for my dataset. Your guidance is greatly appreciated!

ADD REPLY
0
Entering edit mode
9 months ago
glenn.hickey ▴ 520

vg giraffe should still work pretty well, and would remain my suggestion for short read mapping.

ADD COMMENT

Login before adding your answer.

Traffic: 2596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6