KRAKEN2 database build ref-seq have some missing data
11 months ago

Hi, everyone I have a small question on the build-up of a database in Kraken2 ref-seq. When I ran out all of the ref-seq databases in kraken2 like that kraken2-build --download-library bacteria(including fungi and virus) --db \$DBNAME. But unfortunately, I can't find some of the bacteria which I'm interested in (e.g. PJP Pneumocystis pneumonia). Have anyone meet the same problem ?

KRAKEN2 ref-seq • 1.0k views
I had bad experience with some automatic classification tools like kraken/kaiju/etc, even with mock data from known organisms. My humble suggestion: don't use automatic classification tools.

Unfortunately, I had already process the kraken2 ref-seq database by myself( not automatic) but seems like the database too huge to create ( over than 1T), my RAM can not afford it.

Holy moly. Talk about huge. Could you split this db maybe? And how exactly did you process it? You got my curiosity.

I try to split db(.fa) into several parts, but the last part of the process needs to merge and transfer with bracken db. It seems to have to cross this challenge. In my situation, I use the following format of db (>sequence16|kraken:taxid|32630 , fast), as I said it too huge( over 100 thousand microbial need to build).