Question: Kraken2 database curation
2
gravatar for Asaf
23 months ago by
Asaf8.4k
Israel
Asaf8.4k wrote:

Hi all, I'm working on mouse gut microbiome samples and want to use kraken to get their taxonomic profile. I'm using kraken2 with the databases nt and bacteria (plus some others). The problem is that there are bacterial sequences integrated in some genomes in nt, they can be easy to track like weird mammals (I assume no bat entered the lab) but perhaps some parasites have bacterial DNA or fungi and these might be relevant. My question is, is there a neat way to remove those pseudo-bacterial sequences from the database or do some post analysis to remove these unspecific mappings?

Thanks

kraken metagenomics • 2.7k views
ADD COMMENTlink modified 17 months ago • written 23 months ago by Asaf8.4k
2
gravatar for Asaf
17 months ago by
Asaf8.4k
Israel
Asaf8.4k wrote:

So, five months later I'm happy to introduce domain_classifier which is a pretty simple naive-Bayes classifier to tell if a sequence is prokaryote or eukaryote. I wrote a civet pipeline, which is a pipeline management system internal to the Jackson Laboratory but also available on github to build the kraken2 database

What this package does is first predict PFAM domains on predicted ORFs and then use these domains to classify into a taxonomic domain. To filter the kraken DB I simply remove DNA sequences that strongly disagree with the reported taxonomy. This also removes mitochondrial and chloroplast genomes.

ADD COMMENTlink written 17 months ago by Asaf8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2229 users visited in the last hour