Question: How much memory do we need for pruning a variation graph of a chromosome?
gravatar for verne91
16 months ago by
verne9120 wrote:

I am trying to prune the graph for each chromosome. But I found the running memory for different chromosomes are different. Some chromosomes are using about 700Gb, and some are using more than 1.5T. They are also not related with the size of .vg file. So I am wondering how can we determine the memory usage of vg prune?

variation graph vg • 569 views
ADD COMMENTlink modified 15 months ago by Jouni Sirén130 • written 16 months ago by verne9120
gravatar for Jouni Sirén
15 months ago by
Jouni Sirén130
UCSC Genomics Institute
Jouni Sirén130 wrote:

The memory usage is effectively unlimited in v1.19.0. It will be far more reasonable in v1.20.0, and the changes are already included in the master.

The vg prune subcommand enumerates all 24 bp paths in the graph. If the path makes > 3 edge choices, vg stores the edge that violated the constraint. In v 1.19.0, the same edge can be stored multiple times, if we encounter it on different paths. In the worst case, we may end up storing an edge for every 24 bp path in the graph. If the graph is complex enough, this can be orders of magnitude more than what can fit in the memory of any computer.

v1.20.0 detects when edges are encountered multiple times and stores them only once. It also uses DFS instead of BFS to save more memory. Pruning complex graphs may still take very long time, as we have to enumerate the paths anyway. To deal with this, v1.20.0 adds option -M for removing high-degree nodes (e.g. -M 32 to remove nodes with degree > 32). We are still not sure if we should enable this by default.

See Issue #2480 and PR #2494 for further discussion.

ADD COMMENTlink written 15 months ago by Jouni Sirén130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1291 users visited in the last hour