Question

Clustered bacterial RefSeq?

0

Entering edit mode

9 months ago

predeus ★ 1.9k

Hi all,

I am sure I am missing something obvious, and hope someone would point me in the right direction.

I was wondering if there are datasets similar to UniRef90/UniRef50 etc, but done on bacterial RefSeq genome sequences, e.g. by clustering using something like ANI? Basically it would be good to have a "rarified" database with say 10-20k genomes defined by some sort of clustering, without 1000 E. coli genomes etc.

Thank you in advance, as always!

refseq clustering ANI bacteria • 427 views

ADD COMMENT • link 9 months ago by predeus ★ 1.9k

score 2 · Accepted Answer · 2023-07-11

2

Entering edit mode

9 months ago

GenoMax 142k

This sounds like NCBI's Prokaryotic representative reference genome sequences: https://www.ncbi.nlm.nih.gov/refseq/about/prokaryotes/#representative_genomes

Here is that list (17,500 genomes as of July 2023) : https://www.ncbi.nlm.nih.gov/genome/browse#!/prokaryotes/refseq_category:representative

There is already a blast database available: ref_prok_rep_genomes

ADD COMMENT • link 9 months ago by GenoMax 142k

0

Entering edit mode

Thank you! I must have seen the "representative genome" descriptor a hundred times, yet it never occurred to me that's what it is.

ADD REPLY • link 9 months ago by predeus ★ 1.9k