It's frustrating that you haven't mentioned your bacterial species yet - without it, we're just shooting in the dark, since reasonable SNP counts can swing wildly depending on the bug (bacterial species). For example, outbreak strains of Salmonella spp. might show fewer than 10 SNPs, while diverse Eschericia coli populations could rack up hundreds.
That said, here's a quick way to check your baseline:
First, do a literature scan. Head to PubMed and search for "[your species] SNP diversity" or papers on outbreaks for that bacterium. As a rough guide, expect about 0.1–1% nucleotide divergence for closely related strains, which translates to roughly 100–1,000 SNPs per megabase of genome.
Second, hit up public databases. Tools like BV-BRC or NCBI's Pathogen Detection let you compare your SNP counts directly against a bunch of public genomes—shoot for something close to the median pairwise distances there.
Third, validate your own pipeline. Make sure you're filtering your VCF files properly (say, Phred scores above 30 and coverage depth over 20x). If your numbers are more than twice the literature values, double-check your alignment or reference genome.
If you share the species—plus maybe a ballpark on your SNP counts and sequencing depth—I can dig up some precise references for you. So, what's the bacterium?
There are a lot of factors that can impact the number of identified SNPs. These include things like:
The list goes on. So finding an expected number would likely require someone knowledgeable with the specific system and species you are using.
If your bacterial species is a (human) pathogen, you might want to check outbreak analysis papers of that species.
In general, you find there how many SNPs are reasonable to define 2 assemblies as being part of a single source.