how to annotate draft genome
2
0
Entering edit mode
15 months ago
zhangdengwei ▴ 150

Hi all,

I have several bacterial draft genomes assembled by spades. After checking with checkM, I wanna annotate these bacteria which are with 'contaminatin' < 5%, in order to find out their species information. I aligned the draft genome to nt database, but I found one particular genome could be assigned to different bacteria. I am afraid that maybe this approach is not rational. Is there any approach competent in quick bacteria annotation. Thanks in advance.

contigs bacteria draft genome • 544 views
0
Entering edit mode

Can you clarify what

which are with 'contaminatin' < 5%,

means?

Are these mixed samples (e.g. metagenomes)? Are you expecting contamination? If you have single, clean draft genome assemblies, prokka is a tool of choice for annotation.

0
Entering edit mode

Yes, my draft genomes were derived from metagenomes, and I try to split them into single bacteria. Afterward, I used checkM to determine whether it is clean. Each separated bacterium has hundreds of contigs. So now I want to know what the separated bacteria is.

0
Entering edit mode

I would try using a dedicated metagenomic annotation and binning pipeline such as: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03585-4

Though I've never tried it myself, so I can't vouch for it.

1
Entering edit mode
15 months ago
Mensur Dlakic ★ 14k

Assuming that you have a computer with at least 128 Gb RAM (or a combo of RAM+swap > 128 Gb), the most consistent way of doing this is by using a GTDB toolkit. Below is an example of the final output for each bin that is more than 10% complete (it is truncated on the right side so as to not run too far out). As you can see, most metagenomic bins are classified down to the family or genus level, with couple of them having a species designation.

group_06        d__Archaea;p__Crenarchaeota;c__Thermoprotei;o__Desulfurococcales;f__Desulfurococcaceae;g__Thermosphaera;s__Thermosphaera aggregans
group_11        d__Archaea;p__Nanoarchaeota;c__Nanoarchaeia;o__Nanoarchaeales;f__Nanopusillaceae;g__Nanopusillus;s__
group_16        d__Archaea;p__Crenarchaeota;c__Thermoprotei;o__Desulfurococcales;f__Acidilobaceae;g__;s__
group_04        d__Archaea;p__Crenarchaeota;c__Thermoprotei;o__Thermoproteales;f__Thermoproteaceae;g__Pyrobaculum;s__
group_18        d__Archaea;p__Crenarchaeota;c__Thermoprotei;o__Thermofilales;f__Thermofilaceae;g__;s__
group_15        d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales;f__Thermaceae;g__Thermus;s__Thermus aquaticus
group_03        d__Bacteria;p__Aquificota;c__Aquificae;o__Aquificales;f__Aquificaceae;g__Thermocrinis;s__
group_17        d__Bacteria;p__Desulfobacterota;c__Thermodesulfobacteria;o__Thermodesulfobacteriales;f__Thermodesulfobacteriaceae;g__;s__

0
Entering edit mode

Many thanks. That's what I want.

1
Entering edit mode
15 months ago

As Mensur said GTDB can be used to determine taxonomic affiliation of your bins. In case you do not have 128 Gb of RAM, GTDB has been implemented in KBase