Hi, We are profiling the taxonomy of microbial communities on boreal mosses. These mosses' genome have never been sequenced. We are using Kneaddata for preprocessing and Kraken2 for profiling.
Kraken2 is assigning a significant quantity of reads to Physcomitrium patens, a model organism for mosses. Since the P. patens genome surely has a lot in common with that of other mosses, this makes sense, as host DNA is bound to make its way to the sequencer.
The traditional approach to preprocessing is to map reads to host genomes in order to decontaminate the samples in silico, using BWT aligners such as Bowtie2 in our case.
Here are my two questions/topics of discussion :
- We want to use P. patens' genome to decontaminate the sample; do any arguments against that come to mind?
- We wonder which Bowtie2 mode — local or end-to-end — to use when doing so, and why.
Looking forward to read you thoughts on the matter. Cheers!
You could use a tool like
bbsplit.sh
from BBMap suite that can bin the reads when you supply it with multiple genomes (in your case there may be only one). Advantage is you can decide what to do with reads that multi-map within the genome (or across genomes). Options for that arekeep/randomly assign to one genome/assign to all genomes/discard
.Start here: BBSplit syntax for generating builds for the reference genome and how to call different builds.