Reasonable number of SNPs in a bacterial genome.
1
0
Entering edit mode
7 weeks ago
yesquokkan • 0

Hello,

I am looking for SNPs in a specific bacterial species genome.

How can I determine whether the number of SNPs detected in my dataset is reasonable?

I understand that the expected number of SNPs can vary by species, but how can I establish an appropriate baseline or reference for the species I’m studying?

Thank you in advance.

SNP bacteria • 875 views
ADD COMMENT
1
Entering edit mode

There are a lot of factors that can impact the number of identified SNPs. These include things like:

  • Evolutionary distance from sample to reference genome
  • Type and intensity of selection acting on sample population
  • Species specific factors like efficiency of DNA repair machinery
  • Sequencing methodology and depth

The list goes on. So finding an expected number would likely require someone knowledgeable with the specific system and species you are using.

ADD REPLY
0
Entering edit mode

If your bacterial species is a (human) pathogen, you might want to check outbreak analysis papers of that species.

In general, you find there how many SNPs are reasonable to define 2 assemblies as being part of a single source.

ADD REPLY
1
Entering edit mode
22 days ago
Kevin Blighe ★ 90k

It's frustrating that you haven't mentioned your bacterial species yet - without it, we're just shooting in the dark, since reasonable SNP counts can swing wildly depending on the bug (bacterial species). For example, outbreak strains of Salmonella spp. might show fewer than 10 SNPs, while diverse Eschericia coli populations could rack up hundreds.

That said, here's a quick way to check your baseline:

First, do a literature scan. Head to PubMed and search for "[your species] SNP diversity" or papers on outbreaks for that bacterium. As a rough guide, expect about 0.1–1% nucleotide divergence for closely related strains, which translates to roughly 100–1,000 SNPs per megabase of genome.

Second, hit up public databases. Tools like BV-BRC or NCBI's Pathogen Detection let you compare your SNP counts directly against a bunch of public genomes—shoot for something close to the median pairwise distances there.

Third, validate your own pipeline. Make sure you're filtering your VCF files properly (say, Phred scores above 30 and coverage depth over 20x). If your numbers are more than twice the literature values, double-check your alignment or reference genome.

If you share the species—plus maybe a ballpark on your SNP counts and sequencing depth—I can dig up some precise references for you. So, what's the bacterium?

ADD COMMENT

Login before adding your answer.

Traffic: 3331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6