Question: Whole-genome alignment of two or more bacterial genomes - find structural variants
0
gravatar for Tim
12 days ago by
Tim110
United Kingdom
Tim110 wrote:

Hello,

Let's say I have two or more complete bacterial genome sequences produced by Sanger sequencing and/or Nanopore/PacBio (no Illumina reads). The bacterial genomes in question are 95-99% identical on the nucleotide level. What would be the best way to align these genomes (with a pair-wise or multiple alignment) and identify:

  • Short variants: single and multiple nucleotide variations (SNP/MNP), indels
  • Long variants: longer deletions and insertions, inversions, duplications, translocations and so on

I am aware of MUMmer, Mauve and Mugsy, what other programs should I check? Would be great if they could produce a .VCF file as well.

Thanks.

ADD COMMENTlink modified 8 days ago • written 12 days ago by Tim110
2
gravatar for kcamnairb
12 days ago by
kcamnairb40
United States
kcamnairb40 wrote:

NucDiff looks interesting. I haven't been able to try it yet, but I know it can output a vcf.

ADD COMMENTlink written 12 days ago by kcamnairb40

NucDiff looks interesting for sure, will give it a go, thanks.

ADD REPLYlink written 12 days ago by Tim110

Thank you for recommending NucDiff, it does exactly what I wanted.

ADD REPLYlink written 6 days ago by Tim110
1
gravatar for toralmanvar
12 days ago by
toralmanvar290
toralmanvar290 wrote:

For structural variations analysis, you can try Assemblytics which takes .delta file generated from NUCmer (NUCleotide MUMmer) as input.

ADD COMMENTlink written 12 days ago by toralmanvar290
1

@toral

What about short variants? How would a genome aligner be able to identify snp's?

ADD REPLYlink modified 12 days ago • written 12 days ago by naive_user10

According to me, the best way to identify short variants is by mapping reads of one bacteria to the reference genome/scaffolds of another bacteria using samtools/GATK pipeline.

ADD REPLYlink written 12 days ago by toralmanvar290

Well that does not answer my question. But anyways as the OP has long read data, its fairly difficult to identity snp's considering relatively high sequencing error rate

ADD REPLYlink written 11 days ago by naive_user10

Olson ND, Lund SP, Colman RE, et al. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Frontiers in Genetics. 2015;6:235. doi:10.3389/fgene.2015.00235.

Calling SNPs using a genome assembly

SNPs can be identified from genome assemblies, however, since coverage is 1x at each position in an assembly, spurious SNPs cannot be filtered due to insufficient coverage, nor can contaminating genomes be identified and subsequently removed. For individual genes, SNPs are identified by extracting alignments using BLASTN (Altschul et al., 1990) followed by pairwise alignment of the SNPs. For whole genome assemblies, SNPs are typically identified from whole genome alignments made with software such as MUMmer (Kurtz et al., 2004), Mugsy (Angiuoli and Salzberg, 2011), and Mauve (Darling et al., 2004). Software has also been developed for the identification of SNPs from genome assemblies for whole genome phylogenetics including kSNP (Gardner and Hall, 2013) and parSNP (Treangen et al., 2014). SNP identification using assemblies is useful when analyzing individual genes, processing huge datasets, or if raw reads are unavailable. However, when using assemblies for SNP discovery, SNPs cannot be evaluated and verified with the underlying raw read data.

Long-read sequencing quality is improving, I am actually thinking that short-read sequencing will be completely replaced by long-read sequencing in the next 5-10 years. That's why I am interested in whole-genome comparisons. As regards the identification of short variants, while I realise that comparison of genomes/contigs/scaffolds/consensuses is less reliable amd I agree that reads mapping followed by GATK or FreeBayes SNP calling is probably the best method for identification of SNPs, whole-genome comparisons are useful in some situations (absence of raw reads, for example).

ADD REPLYlink written 8 days ago by Tim110

Thanks, haven't heard about it, will check later.

ADD REPLYlink written 12 days ago by Tim110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 860 users visited in the last hour