Low percentage of classified reads (~18%) after Kraken2 analysis – is this expected?
0
0
Entering edit mode
7 hours ago

Hello,

I am working with shotgun metagenomic data from rhizospheric soil samples. I preprocessed the data by removing low-quality reads and adapters. I also removed human genome contamination (only ~0.08% of reads were filtered out).

For taxonomic classification, I used Kraken2 with a custom database that I built from all NCBI organisms. The database size was ~1.5 TB, so I expected it to be quite comprehensive.

After running Kraken2 on the preprocessed and human-filtered reads, I observed that only ~18% of the reads were classified, while ~82% remained unclassified.

My questions are:

Is it normal to get such a low percentage of classified reads in soil metagenomic data?

Could there be an issue with my database construction or the way I ran Kraken2?

What are the possible reasons why ~80% of my reads remain unclassified despite using a large, comprehensive database?

Any advice, possible explanations, or shared experiences with similar soil metagenomic datasets would be greatly appreciated.

Thanks!

Kraken2 metagenomics shotgun • 97 views
ADD COMMENT
1
Entering edit mode

Hard to tell since we don't know how you actually constructed your database or what's actually in it. I would suggest trying using the standard database Kraken2 is distributed with. If the results are significantly higher classification, then it suggests something went wrong during your database creation.

ADD REPLY
0
Entering edit mode

why ~80% of my reads remain unclassified

Have you taken some of those reads and done some blast+ searches via the web interface to see if they return logical hits.

I used Kraken2 with a custom database that I built from all NCBI organisms.

What sequences did you use? Genomes from refseq,nt/nr or something else?

ADD REPLY

Login before adding your answer.

Traffic: 3232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6