How much memory do we need for pruning a variation graph of a chromosome?
1
1
Entering edit mode
4.6 years ago
verne91 ▴ 20

I am trying to prune the graph for each chromosome. But I found the running memory for different chromosomes are different. Some chromosomes are using about 700Gb, and some are using more than 1.5T. They are also not related with the size of .vg file. So I am wondering how can we determine the memory usage of vg prune?

vg variation graph • 1.2k views
ADD COMMENT
2
Entering edit mode
4.6 years ago
Jouni Sirén ▴ 360

The memory usage is effectively unlimited in v1.19.0. It will be far more reasonable in v1.20.0, and the changes are already included in the master.

The vg prune subcommand enumerates all 24 bp paths in the graph. If the path makes > 3 edge choices, vg stores the edge that violated the constraint. In v 1.19.0, the same edge can be stored multiple times, if we encounter it on different paths. In the worst case, we may end up storing an edge for every 24 bp path in the graph. If the graph is complex enough, this can be orders of magnitude more than what can fit in the memory of any computer.

v1.20.0 detects when edges are encountered multiple times and stores them only once. It also uses DFS instead of BFS to save more memory. Pruning complex graphs may still take very long time, as we have to enumerate the paths anyway. To deal with this, v1.20.0 adds option -M for removing high-degree nodes (e.g. -M 32 to remove nodes with degree > 32). We are still not sure if we should enable this by default.

See Issue #2480 and PR #2494 for further discussion.

ADD COMMENT

Login before adding your answer.

Traffic: 2717 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6