Filtering out non-bacterial genomes from metagenomic data
2
0
Entering edit mode
5 months ago
reecemccu • 0

Hello all, does anyone know of a program which can filter out all the non bacterial genomes from metagenomic data.

Thanks!

metagenomic bacteria • 503 views
ADD COMMENT
1
Entering edit mode
5 months ago
Mensur Dlakic ★ 14k

I only know how to do this after the assembly, which may not be what you are asking. I am assuming that you mean non-prokaryotic rather than non-bacterial, but the answer is probably the same.

The first step is to bin the contigs by 4n/5n frequencies. Even related bacterial species can be separated this way, and it is almost a guarantee that any eukaryotic sequence will be well-separated from the rest. The same is true for archaeal bins, in case you really meant non-bacterial genomes. Bins can be classified using GTDB-Toolkit, where eukaryotes will usually be classified as Asgard/Loki group.

ADD COMMENT
0
Entering edit mode

What about mapping the reads with Kraken2 first and sieving out everything that matches to bacteria?

ADD REPLY
2
Entering edit mode

Generally speaking, I am not in favor of removing the reads when it is known that the underlying database used by Kraken2 is not current (the only one I can find is about a year old). Even if the database is current, there is always a possibility that a sample contains a truly novel bacterium which is not in the database, and those sequences would be thrown out.

In my experience, there is no problem in assembling a mix of prokaryotic and eukaryotic reads, and to separate them later after binning. For that matter, separating archaeal and bacterial bins is usually not a problem either. Just so I am not hand-waving, see if you can spot a group labeled 67 at around 8 o'clock in the image below. That is the only eukaryote in a mix of prokaryotes in this metagenome, and I hope it is obvious how cleanly it is separated from the others. Most of archaea and bacteria bins also separate cleanly from each other.

enter image description here

ADD REPLY
0
Entering edit mode
5 months ago
boaty ▴ 170

Hi, reecemccu, you can perform the separation at read level by using Kraken2. Download well made kraken2 and bracken database (I suggest to download the standard database, not mini one) here: https://benlangmead.github.io/aws-indexes/k2 (dec/2020).

And preform kraken2 with

kraken2 --db {kraken2_database_path} --unclassified-out {uncseq} --classified-out {cseq} --use-names --threads {threads} -output {output.txt} -report {output.kreport} {input.fq}

Then kraken2 will classify your reads into different categories, you can select them later in {cseq} by using the index produced in {output.txt}

ADD COMMENT

Login before adding your answer.

Traffic: 2227 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6