Question

How to run vg giraffe for sv genotyping?

5

Entering edit mode

2.8 years ago

brentp 24k

I have seen this: https://github.com/vgteam/vg/wiki/Automatic-indexing-for-read-mapping-and-downstream-inference

This thread: https://github.com/vgteam/vg/issues/3316 seems to be the most comprehensive trail of documentation on how to run, but that seems impossibly difficult to follow.

Same with: https://github.com/vgteam/vg_snakemake/blob/master/Snakefile

Is there documentation on a simple set of commands like:

vg autoindex ??
vg map $fastqs
...
vg giraffe $call-set > genotyped-svs.vcf

That I can see to understand the basic steps?

Then it might be feasible for me to understand the workflows.

Thanks

vg • 2.1k views

ADD COMMENT • link updated 2.8 years ago by Jouni Sirén ▴ 360 • written 2.8 years ago by brentp 24k

0

Entering edit mode

Tagging glenn.hickey

ADD REPLY • link 2.8 years ago by GenoMax 141k

score 2 · Answer 1 · 2021-07-21

First, things depend on the data you have. Do you have a reference genome and a VCF file, and does the VCF contain genotype information? Or do you have a graph as a GFA file, and does the GFA contain no paths, the reference genome as paths, or haplotypes as paths?

In many cases, vg autoindex can build the indexes you need automatically, but it does not work yet with all common input types. (I don't think it can handle the "GFA with haplotypes" case.) Sometimes you have to build the indexes manually.

Second, vg giraffe is a short read mapper. It is much faster and a bit more accurate than vg map, but it requires a representative set of haplotypes to work properly. You can find basic instructions on running Giraffe in the vg wiki.

The SV calling pipeline uses vg pack and vg call. There are some instructions in the wiki, but because I'm not working on that part of vg, I'm not sure if the instructions are still valid.