comprehensive bacterial phylogenetic tree with edge lengths for all of RefSeq or Genbank at NCBI?
1
1
Entering edit mode
4 weeks ago
cmo ▴ 70

Is there a phylogenetic tree with edge lengths that encompasses all of the bacterial genomes in e.g. RefSeq or Genbank at the NCBI?

Basically, an extension of the NCBI Taxonomy to include phylogenetic distances/edge lengths.

Looking around, I found the following related but unsatisfactory resources:

  1. The NCBI tool CommonTree outputs a tree given a set of input taxonomy ids, but the resulting tree does not have edge lengths, so it is just the NCBI Taxonomy in a different format.

  2. The NCBI Taxonomy FTP does not have any tree-like files or anything with phylogenetic distance-like information.

  3. phyloT has a tree with distances, but it requires subscription or payment for use, and I really need something that is up-to-date with the rapidly evolving NCBI repository (RefSeq, Genbank, Taxonomy).

  4. The Genome Taxonomy Database (GTDB) has bacterial and archaeal trees, and the nodes & tips of the tree seem to have RefSeq/Genbank assembly ids, but GTDB uses different taxonomy assignments from NCBI and thus a different phylogenetic tree.

refseq ncbi tree phylogeny • 667 views
ADD COMMENT
0
Entering edit mode

I don't think such tree exists. However, I can say with great confidence that even RefSeq bacteria includes a very large number of falsely annotated genomes. I've arrived independently (by my own methodology) largely at the same conclusions as e.g. this repo, although my resolution is higher 😎

ADD REPLY
2
Entering edit mode
4 weeks ago
Mensur Dlakic ★ 11k

There is a good argument to be made that GTDB taxonomy is more reliable at the moment. There are numerous metagenome-assembled genomes (MAGs) that are only tentatively categorized by NCBI, yet they have clear phylogenetic assignment by GTDB. But don't take my word for it, you can make up your mind by reading recent literature.

There are tools on the GTDB site that presumably are able to convert the GTDB annotations to NCBI. Haven't tried it myself, but it may be useful to you.

ADD COMMENT
0
Entering edit mode

I agree with you, Mensur, that the GTDB is arguably more reliable and comprehensive than NCBI. However, due to the relative greater longevity and widespread usage of the NCBI, the problem setting or application often constrains us to the use of the NCBI taxonomic tree, even if we would prefer the GTDB.

ADD REPLY

Login before adding your answer.

Traffic: 1276 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6