I wish to find the closest Salmonella complete genome for any given input NGS sample. There are 600+ complete Salmonella genomes available. That makes for a "database" with a lot of very close/related sequences (strains).
I already tried StrainSeeker. First test against this software default database was promising (spotting the expected serotype). But it seems to fail for a custom database (containing the 600+ genomes) with strains so similar (not enough discriminants k-mers between strains ?).
Maybe I should use a smaller amount of complete genomes and focus on the best representatives (starting point) ?
Maybe there is a tool made for this particular job that I am not aware of (tried my best to find it, but I am new to microbial comparative genomics)
Or maybe this approch is not correct (underlying goal is to tell if our incoming newly sequenced Salmonella strains correspond to something new or not vs current available complete genomes. Atm I thought spotting the closest reference genome + calling SNPs against it would be good enough).