How to run vg giraffe for sv genotyping?
1
5
Entering edit mode
18 months ago
brentp 24k

This thread: https://github.com/vgteam/vg/issues/3316 seems to be the most comprehensive trail of documentation on how to run, but that seems impossibly difficult to follow.

Is there documentation on a simple set of commands like:

vg autoindex ??
vg map $fastqs ... vg giraffe$call-set > genotyped-svs.vcf


That I can see to understand the basic steps?

Then it might be feasible for me to understand the workflows.

Thanks

vg • 1.2k views
0
Entering edit mode

Tagging glenn.hickey

2
Entering edit mode
18 months ago
Jouni Sirén ▴ 300

First, things depend on the data you have. Do you have a reference genome and a VCF file, and does the VCF contain genotype information? Or do you have a graph as a GFA file, and does the GFA contain no paths, the reference genome as paths, or haplotypes as paths?

In many cases, vg autoindex can build the indexes you need automatically, but it does not work yet with all common input types. (I don't think it can handle the "GFA with haplotypes" case.) Sometimes you have to build the indexes manually.

Second, vg giraffe is a short read mapper. It is much faster and a bit more accurate than vg map, but it requires a representative set of haplotypes to work properly. You can find basic instructions on running Giraffe in the vg wiki.

The SV calling pipeline uses vg pack and vg call. There are some instructions in the wiki, but because I'm not working on that part of vg, I'm not sure if the instructions are still valid.