.gfa is a text-based file that contains the structure of a pan-genome graph. I can write a script to parse this file, but it is time consuming due to its size.
However, there are several other formats used by VG. For example, .gbz, .vg, and .xg. These files are all binary, and I can't intuitively understand what information is contained in them or which information can be extracted from them.
I am wondering if there is any way to get the source and sequence for a specific node/segment. The source might indicate which haplotype contains this node.
vg convertcan convert those formats into GFA, andvg chunkcan be used to query small graph regions. However,vg chunkloads the entire graph into memory for each query. This makes it fast enough for individual interactive queries, but too slow to be very effective as a backend to programmatic queries. There's development currently underway on a more responsive SQL-based query interface here.