Question

Kraken 2 Strain Level Classification?

0

Entering edit mode

3.7 years ago

psun • 0

Hello,

I have a question regarding the Kraken 2 classifier, and maybe you would be able to let me know if I am thinking about this incorrectly. For Kraken 2, to build our own custom database, we need the following (Here is the reference):

1) Install a taxonomy (NCBI)

2) Install one or more reference libraries (we can also include our own sequences in this step using FASTA files)

3) Build the database using certain Kraken 2 command

For Kraken 2, to add other genomes for step 2, the documentation says to have sequences in FASTA or multi-FASTA files. Each sequence ID in the file(s) should also contain an NCBI accession number or an explicit assignment to a taxid. If I had my own database that has a column of strain sequences, strain names, and another column with the matching NCBI accession number, I would I be able to add these sequences to step 2 by making my own FASTA file from this information.

Would it be possible to get Kraken 2 to classify reads that match these strains from our own custom database? (Kraken 2 documentation says that it does not classify reads at the strain level)

I suppose I'm more confused about why some tools only allow for classification to the species level when we can make your own database that provides sequences at the strain level (unless the classifier tool is not able to look up the strain information from NCBI to be able to classify the reads properly)? Please let me know if there is any gap in my understanding.

Thank you.

UPDATE: Kraken 2 allows for strain level if you use your own custom database as long as the kmers are unique enough to classify at the strain level.

kraken 2 classifier strain • 1.6k views

ADD COMMENT • link 3.7 years ago by psun • 0

0

Entering edit mode

Answer was found in Bracken 2 GitHub issues

For this reason the post is now closed.

Cheers!

ADD REPLY • link 3.7 years ago by psun • 0

0

Entering edit mode

Closing a post is an action used by moderators for other reasons. If you found an answer for your question then please post the complete GitHub link as an answer so anyone finding this post in future will be able to get the answer.

ADD REPLY • link 3.7 years ago by GenoMax 141k