Question: Kraken2 database curation
2
gravatar for Asaf
11 months ago by
Asaf6.3k
Israel
Asaf6.3k wrote:

Hi all, I'm working on mouse gut microbiome samples and want to use kraken to get their taxonomic profile. I'm using kraken2 with the databases nt and bacteria (plus some others). The problem is that there are bacterial sequences integrated in some genomes in nt, they can be easy to track like weird mammals (I assume no bat entered the lab) but perhaps some parasites have bacterial DNA or fungi and these might be relevant. My question is, is there a neat way to remove those pseudo-bacterial sequences from the database or do some post analysis to remove these unspecific mappings?

Thanks

kraken metagenomics • 1.3k views
ADD COMMENTlink modified 5 months ago • written 11 months ago by Asaf6.3k
2
gravatar for Asaf
5 months ago by
Asaf6.3k
Israel
Asaf6.3k wrote:

So, five months later I'm happy to introduce domain_classifier which is a pretty simple naive-Bayes classifier to tell if a sequence is prokaryote or eukaryote. I wrote a civet pipeline, which is a pipeline management system internal to the Jackson Laboratory but also available on github to build the kraken2 database

What this package does is first predict PFAM domains on predicted ORFs and then use these domains to classify into a taxonomic domain. To filter the kraken DB I simply remove DNA sequences that strongly disagree with the reported taxonomy. This also removes mitochondrial and chloroplast genomes.

ADD COMMENTlink written 5 months ago by Asaf6.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 787 users visited in the last hour