How much time should be expected for building a database by kraken2-build?
0
0
Entering edit mode
3.2 years ago

Hello,

I am trying to build a database containing non-redundant nucleotide sequences from NCBI (NCBI_nt) and some other rumen genomes downloaded from the other website. Following Kraken2's manual, I could download NCBI taxonomy, NCBI_nt database and add the rumen genomes into the database. These steps were successfully done. However, when processing step 3 which is building the database, although there has been no error message, it has been running for a long time (43 hours). I checked the server and saw that the program is still running; however, I did not see any new output in the last 24 hours. My question is how long should I expect to wait for the process to be finished? Does it usually take this long time?

Followings was the command line I used and what appeared on my screen:

./kraken2-build --build --threads 20 --db $DBNAME

Creating sequence ID to taxonomy ID map (step 1)...

Found 72450193/72530381 targets, searched through 779285158 accession IDs, search complete.

lookup_accession_numbers: 80188/72530381 accession numbers remain unmapped, see unmapped.txt in DB directory

Sequence ID to taxonomy ID map complete. [20m26.938s]

Estimating required capacity (step 2)...

Estimated hash table requirement: 219511709988 bytes

Capacity estimation complete. [58m59.084s]

Building database files (step 3)...

Taxonomy parsed and converted.

CHT created with 22 bits reserved for taxid.

Processed 13414311 sequences (68169888726 bp)...

Any information from you will be much appreciated. Thank you so much!

kraken2 • 3.1k views
ADD COMMENT
0
Entering edit mode
Using 32 threads on an AWS EC2 r4.8xlarge instance with 16 dual-core hyperthreaded 2.30 GHz CPUs and 244 GB of RAM, the build process took approximately 35 minutes in Jan. 2018.

Copy/pasted from https://github.com/DerrickWood/kraken2/wiki/Manual#standard-kraken-2-database. Or you can download the indices from https://benlangmead.github.io/aws-indexes/k2 as posted in C: How do you download the nt database for Kraken2? by h.mon .

ADD REPLY
0
Entering edit mode

Thanks for your reply! I really appreciate it! I previously used 20 threads and ran the database building process in 4 days. Then, following your advice, I rerun using 32 threads, I got the same issue after 5 hours: no more new output but the program keeps running without error. My memory status is good, only 28% of the memory was used. My database includes non-redundant nucleotide sequences from NCBI and 410 genome assemblies downloaded from the other source. I added the taxon id into the ID lines of the genome assembly files. The "add-to-library" process was completed without any warning or error message, so I think it was successful. Now I don't know what the issue is exactly is. Anymore idea would be much appreciated.

ADD REPLY
0
Entering edit mode

this is the command line that I used and messages printed out on the screen:

./kraken2-build --build --threads 32 --db $DBNAME

Creating sequence ID to taxonomy ID map (step 1)...

Found 72450193/72530381 targets, searched through 779285158 accession IDs, search complete.

lookup_accession_numbers: 80188/72530381 accession numbers remain unmapped, see unmapped.txt in DB directory

Sequence ID to taxonomy ID map complete. [21m18.897s]

Estimating required capacity (step 2)...

Estimated hash table requirement: 219511709988 bytes

Capacity estimation complete. [1h7m34.016s]

Building database files (step 3)...

Taxonomy parsed and converted.

CHT created with 22 bits reserved for taxid.

Processed 13414311 sequences (68169888726 bp)...
ADD REPLY
0
Entering edit mode

did you replace $DBNAME with appropriate folder name?

ADD REPLY
0
Entering edit mode

Yes, I did. DBNAME=Rumen_nt (my folder name is "Rumen_nt". I saw some output files in this folder when running kraken2-build)

ADD REPLY
0
Entering edit mode

@thitrucminh.nguyen were you able to fix the issue, I am facing the same thing using 50 cores and 500 GB RAM

ADD REPLY
0
Entering edit mode

May be worth trying with --fast-build:

https://github.com/DerrickWood/kraken2/issues/492

You may want to read my post on this issue.

ADD REPLY

Login before adding your answer.

Traffic: 3030 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6