hisat2-build were killed when attempting to index the Ginkgo genome
2
0
Entering edit mode
2.0 years ago

The Ginkgo genome is about 9.8GB, and I tried to build index in hisat2-build with the parameters --ss and --exon

however, two times when I run this task, it was killed without any alert, the running log is as follows: enter image description here

My server is Ubuntu 18.04.5 LTS, has 48 CPUs and 512G RAM. I don't know why? Please help me!

index hisat2 • 2.2k views
ADD COMMENT
0
Entering edit mode

See also: https://stackoverflow.com/questions/726690/what-killed-my-process-and-why for the background on how processes get killed and how to find out why. So it is almost certainly a memory issue.

ADD REPLY
3
Entering edit mode
2.0 years ago
Michael 54k

Also, highsat build has several options to tune performance and memory consumption during FM-index generation.

So it might be possible to use less memory (if that is really the problem). I would start with setting the additional parameters:

--noauto --bmaxdivn 8 --dcv 2048

And gradually increase, but that will also increase run-time.

From the manual:

-a/--noauto
Disable the default behavior whereby hisat2-build automatically selects values for the --bmax, --dcv and [--packed] parameters according to available memory. Instead, user may specify values for those parameters. If memory is exhausted during indexing, an error message will be printed; it is up to the user to try new parameters.
--bmax <int>
The maximum number of suffixes allowed in a block. Allowing more suffixes per block makes indexing faster, but increases peak memory usage. Setting this option overrides any previous setting for --bmax, or --bmaxdivn. Default (in terms of the --bmaxdivn parameter) is --bmaxdivn 4. This is configured automatically by default; use -a/--noauto to configure manually.
--bmaxdivn <int>
The maximum number of suffixes allowed in a block, expressed as a fraction of the length of the reference. Setting this option overrides any previous setting for --bmax, or --bmaxdivn. Default: --bmaxdivn 4. This is configured automatically by default; use -a/--noauto to configure manually.
--dcv <int>
Use <int> as the period for the difference-cover sample. A larger period yields less memory overhead, but may make suffix sorting slower, especially if repeats are present. Must be a power of 2 no greater than 4096. Default: 1024. This is configured automatically by default; use -a/--noauto to configure manually.
ADD COMMENT
0
Entering edit mode

I tried the following code:

--noauto --bmaxdivn 1 --dcv 4096

but it still was killed, unfortunately.

ADD REPLY
0
Entering edit mode

Yes, try --bmaxdivn 8 then 12, 16, etc. According to the documentation, this is a fraction so 8 means 1/8. If it is still killed I would try to increase the number further. With setting it to 1, it has in fact used more memory. It might also help to decrease the number of CPUs used for indexing, because each CPU might require its own shed of memory. If all that does not help, you need to have chat with your local IT support about how to monitor resource consumption. Last resort could be to not use --ss and --exon annotations at all, these seem to increase memory usage but comes at the price of not having annotated splice sites.

ADD REPLY
0
Entering edit mode

Btw, could you send me the download URLs for the Ginkgo genome and annotation and splice-site files you downloaded? I couldn't find them in GenBank.

ADD REPLY
2
Entering edit mode
2.0 years ago
Mensur Dlakic ★ 27k

This pretty much has to be memory-related. You could investigate yourself by monitoring memory usage throughout the run.

It is easy to think that 512 Gb has to be enough, but it really doesn't. It says that for human genome doing the same thing you are doing requires at least 160 Gb:

http://daehwankimlab.github.io/hisat2/howto/

If that's true, it would seem that 512 Gb is not enough for Gbi genome.

ADD COMMENT
1
Entering edit mode

I am just wondering what hi(gh)sat could be doing that requires ~50x the size of its input in memory?

ADD REPLY

Login before adding your answer.

Traffic: 2396 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6