Ortholog clusters in Bacteria
1
0
Entering edit mode
2.2 years ago

Hi,

Are there any databases/initiatives that compute ortholog clusters in Bacteria? The NCBI COG database covers only 1,300 bacterial species. I am looking for something more comprehensive, ideally to cover all 45,000 bacterial species from Genome Taxonomy Database (GTDB).

Thanks!

orthologs gene family bacteria GtDB • 718 views
ADD COMMENT
1
Entering edit mode

I think eggNOG is the largest database with precomputed ortholog groups from 4,400 representative bacterial species

ADD REPLY
2
Entering edit mode
2.2 years ago
Mensur Dlakic ★ 27k

COG database was last updated in 2020, which you probably know. Just for the sake of other readers:

https://academic.oup.com/nar/article/49/D1/D274/5964069

As they indicate in the section titled "Expanded genome coverage", it was too computationally intensive to study all available genomes and MAGs - the majority of both bacteria and archaea in GTDB are MAGs. It was too demanding to study even complete genomes, so they settled on a smaller group. Given the number of COGs (4877), I think that group is representative of both kingdoms even though it is based on a relatively small fraction of the total.

There is some evidence of novel and unusual protein families in more recently discovered bacteria:

Still, it is a safe bet that many of those are just divergent variants of known proteins, as it is unlikely that these groups evolved thousands of protein families different from all other bacteria.

The first reference above has more ideas about doing this kind of analysis if you want to try it on your own, which I don't recommend. I have done it for about 5,000 MAGs, and it is very difficult to set up and takes a great deal of time, memory and general resources. If you are not easily dissuaded, this may help:

https://github.com/raphael-upmc/proteinClusteringPipeline

ADD COMMENT

Login before adding your answer.

Traffic: 1771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6