Question: Kraken2 database curation
2
gravatar for Asaf
16 months ago by
Asaf7.2k
Israel
Asaf7.2k wrote:

Hi all, I'm working on mouse gut microbiome samples and want to use kraken to get their taxonomic profile. I'm using kraken2 with the databases nt and bacteria (plus some others). The problem is that there are bacterial sequences integrated in some genomes in nt, they can be easy to track like weird mammals (I assume no bat entered the lab) but perhaps some parasites have bacterial DNA or fungi and these might be relevant. My question is, is there a neat way to remove those pseudo-bacterial sequences from the database or do some post analysis to remove these unspecific mappings?

Thanks

kraken metagenomics • 2.0k views
ADD COMMENTlink modified 11 months ago • written 16 months ago by Asaf7.2k
2
gravatar for Asaf
11 months ago by
Asaf7.2k
Israel
Asaf7.2k wrote:

So, five months later I'm happy to introduce domain_classifier which is a pretty simple naive-Bayes classifier to tell if a sequence is prokaryote or eukaryote. I wrote a civet pipeline, which is a pipeline management system internal to the Jackson Laboratory but also available on github to build the kraken2 database

What this package does is first predict PFAM domains on predicted ORFs and then use these domains to classify into a taxonomic domain. To filter the kraken DB I simply remove DNA sequences that strongly disagree with the reported taxonomy. This also removes mitochondrial and chloroplast genomes.

ADD COMMENTlink written 11 months ago by Asaf7.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1351 users visited in the last hour