Question: Whole-genome alignment of two or more bacterial genomes - find structural variants
gravatar for Tim
11 months ago by
United Kingdom
Tim110 wrote:


Let's say I have two or more complete bacterial genome sequences produced by Sanger sequencing and/or Nanopore/PacBio (no Illumina reads). The bacterial genomes in question are 95-99% identical on the nucleotide level. What would be the best way to align these genomes (with a pair-wise or multiple alignment) and identify:

  • Short variants: single and multiple nucleotide variations (SNP/MNP), indels
  • Long variants: longer deletions and insertions, inversions, duplications, translocations and so on

I am aware of MUMmer, Mauve and Mugsy, what other programs should I check? Would be great if they could produce a .VCF file as well.


ADD COMMENTlink modified 11 months ago • written 11 months ago by Tim110
gravatar for kcamnairb
11 months ago by
United States
kcamnairb40 wrote:

NucDiff looks interesting. I haven't been able to try it yet, but I know it can output a vcf.

ADD COMMENTlink written 11 months ago by kcamnairb40

NucDiff looks interesting for sure, will give it a go, thanks.

ADD REPLYlink written 11 months ago by Tim110

Thank you for recommending NucDiff, it does exactly what I wanted.

ADD REPLYlink written 11 months ago by Tim110
gravatar for toralmanvar
11 months ago by
toralmanvar750 wrote:

For structural variations analysis, you can try Assemblytics which takes .delta file generated from NUCmer (NUCleotide MUMmer) as input.

ADD COMMENTlink written 11 months ago by toralmanvar750


What about short variants? How would a genome aligner be able to identify snp's?

ADD REPLYlink modified 11 months ago • written 11 months ago by naive_user70

According to me, the best way to identify short variants is by mapping reads of one bacteria to the reference genome/scaffolds of another bacteria using samtools/GATK pipeline.

ADD REPLYlink written 11 months ago by toralmanvar750

Well that does not answer my question. But anyways as the OP has long read data, its fairly difficult to identity snp's considering relatively high sequencing error rate

ADD REPLYlink written 11 months ago by naive_user70

Olson ND, Lund SP, Colman RE, et al. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Frontiers in Genetics. 2015;6:235. doi:10.3389/fgene.2015.00235.

Calling SNPs using a genome assembly

SNPs can be identified from genome assemblies, however, since coverage is 1x at each position in an assembly, spurious SNPs cannot be filtered due to insufficient coverage, nor can contaminating genomes be identified and subsequently removed. For individual genes, SNPs are identified by extracting alignments using BLASTN (Altschul et al., 1990) followed by pairwise alignment of the SNPs. For whole genome assemblies, SNPs are typically identified from whole genome alignments made with software such as MUMmer (Kurtz et al., 2004), Mugsy (Angiuoli and Salzberg, 2011), and Mauve (Darling et al., 2004). Software has also been developed for the identification of SNPs from genome assemblies for whole genome phylogenetics including kSNP (Gardner and Hall, 2013) and parSNP (Treangen et al., 2014). SNP identification using assemblies is useful when analyzing individual genes, processing huge datasets, or if raw reads are unavailable. However, when using assemblies for SNP discovery, SNPs cannot be evaluated and verified with the underlying raw read data.

Long-read sequencing quality is improving, I am actually thinking that short-read sequencing will be completely replaced by long-read sequencing in the next 5-10 years. That's why I am interested in whole-genome comparisons. As regards the identification of short variants, while I realise that comparison of genomes/contigs/scaffolds/consensuses is less reliable amd I agree that reads mapping followed by GATK or FreeBayes SNP calling is probably the best method for identification of SNPs, whole-genome comparisons are useful in some situations (absence of raw reads, for example).

ADD REPLYlink written 11 months ago by Tim110

Thanks, haven't heard about it, will check later.

ADD REPLYlink written 11 months ago by Tim110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2277 users visited in the last hour