Filtering out non-bacterial genomes from metagenomic data
2
0
Entering edit mode
5 months ago
reecemccu • 0

Hello all, does anyone know of a program which can filter out all the non bacterial genomes from metagenomic data.

Thanks!

metagenomic bacteria • 503 views
1
Entering edit mode
5 months ago
Mensur Dlakic ★ 14k

I only know how to do this after the assembly, which may not be what you are asking. I am assuming that you mean non-prokaryotic rather than non-bacterial, but the answer is probably the same.

The first step is to bin the contigs by 4n/5n frequencies. Even related bacterial species can be separated this way, and it is almost a guarantee that any eukaryotic sequence will be well-separated from the rest. The same is true for archaeal bins, in case you really meant non-bacterial genomes. Bins can be classified using GTDB-Toolkit, where eukaryotes will usually be classified as Asgard/Loki group.

0
Entering edit mode

What about mapping the reads with Kraken2 first and sieving out everything that matches to bacteria?

2
Entering edit mode

Generally speaking, I am not in favor of removing the reads when it is known that the underlying database used by Kraken2 is not current (the only one I can find is about a year old). Even if the database is current, there is always a possibility that a sample contains a truly novel bacterium which is not in the database, and those sequences would be thrown out.

In my experience, there is no problem in assembling a mix of prokaryotic and eukaryotic reads, and to separate them later after binning. For that matter, separating archaeal and bacterial bins is usually not a problem either. Just so I am not hand-waving, see if you can spot a group labeled 67 at around 8 o'clock in the image below. That is the only eukaryote in a mix of prokaryotes in this metagenome, and I hope it is obvious how cleanly it is separated from the others. Most of archaea and bacteria bins also separate cleanly from each other.

0
Entering edit mode
5 months ago
boaty ▴ 170

And preform kraken2 with

kraken2 --db {kraken2_database_path} --unclassified-out {uncseq} --classified-out {cseq} --use-names --threads {threads} -output {output.txt} -report {output.kreport} {input.fq}

Then kraken2 will classify your reads into different categories, you can select them later in {cseq} by using the index produced in {output.txt}