kraken2 read assignment: difference between G and G1?
1
0
Entering edit mode
5 months ago
cjb ▴ 10

Here is a snippet of a kraken2 report.

  0.74  605528  0       O       85011               Streptomycetales
0.74  605528  10003   F       2062                  Streptomycetaceae
0.71  581926  131552  G       1883                    Streptomyces
0.21  168546  40902   G1      2593676                   unclassified Streptomyces
0.01  9744    9744    S       2005885                     Streptomyces sp. S063
0.01  8561    8561    S       2742137                     Streptomyces sp. NA02950
0.01  4360    4360    S       2609808                     Streptomyces sp. LBUM 1480
0.01  4137    4137    S       659352                      Streptomyces sp. SN-593
0.00  3867    3867    S       2078691                     Streptomyces sp. CB01881
0.00  3086    3086    S       2175864                     Streptomyces sp. NHF165
0.00  3077    3077    S       2721244                     Streptomyces sp. RPA4-2
0.00  2753    2753    S       2136173                     Streptomyces sp. So13.3
0.00  2656    2656    S       1972846                     Streptomyces sp. Sge12


I am not understanding my there are reads classified as Streptomyces and to "unclassified" Streptomyces. I can't find information on this in the kraken2 wiki.

Once guess might be that if there were an entry in the kraken2 database that were classified only to the genus level (taxon 1883, then reads could be assigned to that. However, all of the Streptomyces entries in the database are to the species or strain level.

Any ideas?

unclassified kraken2 genus • 280 views
1
Entering edit mode
5 months ago
cjb ▴ 10

It has to do with the structure of the NCBI taxonomy. Many entries in the taxonomy are Streptomyces something or other, having their taxonomic parent as "Unclassified Streptomyces" (taxid 2593676), rather than the genus Streptomyces (taxid 1883). The taxonomic parent of 2593676 is 1883. I don't know the reason for the extra layer, but in the above example G1 is indented under G so the G1 counts are included in the G counts. So probably nothing to worry about, unless taxonomic divisions not based on phylogenetics disturb you, which they should.