I am trying to create a custom kraken2 DB based on Alveolata sequences that I fetched from NCBI, I even simplified it by including just the first one, for example:
>NW_027179382.1 GTAACCCGGTTGACTCTGCCGGTAGTATATGCTTGTCTCAAAGATTAAGCCATGCATGCGAAAGTATAAG ACTTTATACGTCGAAACCGCAGACGGCTCATTAAAACAGTCATGATCTACACGCATATTGATCACACGGC TAACCGTGGTAATTCTGGGGATAATACGTGCAGCTTCGGCTACTCTTTTTCAGAGTTGTTGTAGAAATCA GCATTCACACTATCACCATTTGAATAAGTCTACAATTCAATTGCTTGTCAATGATGCGTTTGAATATCTG ATCTATCAGTTCTGACGGTAGTGTAGTGGACTACCGTGACTGTAACGGATAACGGAGAATTAGGGTTCGA TTCCGGAGAAGGAGCCTTAAAAACAGCTACTACATCTAAGGAAGGCAGCAGGCGCGCAAATTGCTCAATG AAGGTCATTCGAAGCAGTGACAAGAAATATCAAAGCCAGCTTTCAGCTCGCTATTGATCTGAGGGTAATT TAAAAACTTACTCGATTATTATTGGATCGCTAGTGGGGTGCCAGCCGGAGCGGTAATACCTCCTCCAATA GTGTATGCTAAAATTGTTGCAGTTAAAACGCTCGTAGTCGTAGTTTCTTGACACTTTCAGCATGCCTAAC
Then I do:
kraken2-build --add-to-library alveolata.fasta --db Alveolata
kraken2-build --download-taxonomy --db Alveolata --threads 32
kraken2-build --build --db Alveolata --threads 32
But then the database seems empty:
bash-5.1$ kraken2-inspect --db Alveolata
kraken2-inspect --db Alveolata
Database options: nucleotide db, k = 35, l = 31
Spaced mask = 11111111111111111111111111111111110011001100110011001100110011
Toggle mask = 1110001101111110001010001100010000100111000110110101101000101101
Total taxonomy nodes: 116
Table size: 0
Table capacity: 12068
Min clear hash value = 0
What am I missing?
Try adding the taxonomy explicitly (from the manual) :
Edit: This is not required as long as the headers contain NCBI accession numbers.
Thank you. However, Kraken2 is supposed to fetch the taxa information from NCBI. From the manual:
How can I get this to work, or does it really not work then?