Kraken2 nt database build time
0
0
Entering edit mode
10 months ago

Hello Everybody,

I am currently building a Kraken2 nt database. It has been running for 1.5 months now and still doesn't want to finish. I'm currently at 98000 hours of CPU time. Does anyone know how long should it take to finish? I'm running this on two Intel(R) Xeon(R) Gold 6252 CPUs @ 2.10GHz with 96 threads and 1006 GB RAM.

Kraken2 database nt • 554 views
0
Entering edit mode

Kudos for being this patient. 1.5 months is an eternity. Clearly kraken2 is either still working on the creation or the process may have hung up a month back and you are waiting for nothing. Are you able to see the output file size grow with time? What does the top/htop monitoring say? If it has taken you this long to build the database consider how long the search may take.

You should consider alternate approaches that may be more feasible to implement. A smaller database/pre-made indexes etc.

0
Entering edit mode

Hi GenoMax, thank you for your reply. The top and htop say the process is running, not sleeping. The resources are heavily used (8500% CPU and ~400 GB RAM) by the process. I have only one output file from the expected three (taxo.k2d.tmp). It was created 1 day after I started the run and the size is constant since then. Although this is only a tmp file this may be its final version. The htop says that another file is under construction now (hash.k2d) which is at least one order of magnitude biger than the taxo.k2d (based on other databases). I checked the screen I started the process in. It says the hash table will be 320 GB and 18 million sequences (90 billion basepairs) are processed by now. So this is a lot of k-mer to process. The database has more than 80 million accession numbers. So it will take a really long time. I think I will look for alternatives. All the bests, Krisztián