I would like to know more about what reference database is being used in these standard operating procedures for metagenomic analysis of microbiome samples: https://github.com/merckey/microbiome_helper/wiki/Metagenomics-Standard-Operating-Procedure-v3
Here is what is listed in the procedures for using Kraken2 to obtain "raw" taxonomy profiles
*copy the Kraken2 database to the mounted ramdisk in your scratch location (on SSD drive) to put the database directly into RAM, which speeds up processing enormously as the database normally has to be read for each sample:
cp -r /home/shared/Kraken2.0.8_Bracken150mer_RefSeqCompleteV93
It says the database it about 800GB, my question is what NCBI databases does it consist of exactly?
I found this list of what it potentially may be made up of from the Kraken2 manual where some standard genomes are named that can be used for creating a custom database:
- bacteria: RefSeq complete bacterial/archaeal genomes
- plasmids: RefSeq plasmid sequences
- viruses: RefSeq complete viral genomes
- human: GRCh38 human genome
Would it be made up of any of these? Or all of them?