I have a collection of ~100 4.5 megabase haploid assemblies that I would like to map to using giraffe. However, I am not completely clear on what the best practices are to construct the graph starting from the assemblies. I have used PGGB to create a GFA with haplotype information, but from the wiki and previous biostars responses
vg autoindex --giraffe only works from a VCF + Ref and does not currently support working from a GFA with haplotype information.
I have considered a few options:
- Manually create all of the indexes for giraffe using the commands found here: https://github.com/vgteam/vg/wiki/Index-Types
vg deconstructto create a VCF containing all variation in the PGGB GFA graph relative to one reference, and then use that VCF + ref FASTA to run
vg autoindex --giraffe.
- Use an alternative method to create a VCF from assemblies, although I am not sure which method would be best for this.
I would appreciate any advice on which of these options are best, or for any advice in general about what would be the best practice when constructing graphs directly from haploid assemblies.