Hi all,
I am sure I am missing something obvious, and hope someone would point me in the right direction.
I was wondering if there are datasets similar to UniRef90/UniRef50 etc, but done on bacterial RefSeq genome sequences, e.g. by clustering using something like ANI? Basically it would be good to have a "rarified" database with say 10-20k genomes defined by some sort of clustering, without 1000 E. coli genomes etc.
Thank you in advance, as always!
Thank you! I must have seen the "representative genome" descriptor a hundred times, yet it never occurred to me that's what it is.