Question

Filtering out non-bacterial genomes from metagenomic data

0

Entering edit mode

3.0 years ago

reecemccu • 0

Hello all, does anyone know of a program which can filter out all the non bacterial genomes from metagenomic data.

Thanks!

metagenomic bacteria • 1.7k views

ADD COMMENT • link updated 3.0 years ago by boaty ▴ 220 • written 3.0 years ago by reecemccu • 0

score 1 · Answer 1 · 2021-05-05

1

Entering edit mode

3.0 years ago

Mensur Dlakic ★ 27k

I only know how to do this after the assembly, which may not be what you are asking. I am assuming that you mean non-prokaryotic rather than non-bacterial, but the answer is probably the same.

The first step is to bin the contigs by 4n/5n frequencies. Even related bacterial species can be separated this way, and it is almost a guarantee that any eukaryotic sequence will be well-separated from the rest. The same is true for archaeal bins, in case you really meant non-bacterial genomes. Bins can be classified using GTDB-Toolkit, where eukaryotes will usually be classified as Asgard/Loki group.

ADD COMMENT • link 3.0 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

What about mapping the reads with Kraken2 first and sieving out everything that matches to bacteria?

ADD REPLY • link 3.0 years ago by Dunois ★ 2.5k

2

Entering edit mode

Generally speaking, I am not in favor of removing the reads when it is known that the underlying database used by Kraken2 is not current (the only one I can find is about a year old). Even if the database is current, there is always a possibility that a sample contains a truly novel bacterium which is not in the database, and those sequences would be thrown out.

In my experience, there is no problem in assembling a mix of prokaryotic and eukaryotic reads, and to separate them later after binning. For that matter, separating archaeal and bacterial bins is usually not a problem either. Just so I am not hand-waving, see if you can spot a group labeled 67 at around 8 o'clock in the image below. That is the only eukaryote in a mix of prokaryotes in this metagenome, and I hope it is obvious how cleanly it is separated from the others. Most of archaea and bacteria bins also separate cleanly from each other.

enter image description here

ADD REPLY • link 3.0 years ago by Mensur Dlakic ★ 27k

score 1 · Answer 2 · 2021-05-05

Hi, reecemccu, you can perform the separation at read level by using Kraken2. Download well made kraken2 and bracken database (I suggest to download the standard database, not mini one) here: https://benlangmead.github.io/aws-indexes/k2 (dec/2020).

And preform kraken2 with

kraken2 --db {kraken2_database_path} --unclassified-out {uncseq} --classified-out {cseq} --use-names --threads {threads} -output {output.txt} -report {output.kreport} {input.fq}

Then kraken2 will classify your reads into different categories, you can select them later in {cseq} by using the index produced in {output.txt}