vg deconstruct with path sizes
2
0
Entering edit mode
2.3 years ago
egoltsman • 0

Hi, I am wondering if there is a way to output snarls with path size information. Currently, if I go the route of 'vg snarls', then 'vg deconstruct', the vcf file contains only the variant sequences, and I am forced to parse those out and calculate the string size for each one, which is not too efficient when you throw a whole pangenome at it. If this information is already available internally during snarl calling, is there a way to extract/output it?

Thanks!

vg • 750 views
0
Entering edit mode
2.3 years ago
glenn.hickey ▴ 250

If I understand correctly, you want the length of each allele stored in some kind of VCF Format field? I suppose this is possible, but as far as I know, must VCF parsers would be parsing the alleles into strings in memory anyway which would allow you to get the size just as efficiently.

As mentioned on github, there should soon be an interface to get snarl traversals using a variety of algorithms (including the one used in deconstruct -e) in GAF format. Hopefully that will be more efficient for you to parse.

0
Entering edit mode
2.3 years ago
egoltsman • 0

That's great to know. Thanks!