Whole-genome alignment of two or more bacterial genomes - find structural variants
3
0
Entering edit mode
5.9 years ago
Tim ▴ 130

Hello,

Let's say I have two or more complete bacterial genome sequences produced by Sanger sequencing and/or Nanopore/PacBio (no Illumina reads). The bacterial genomes in question are 95-99% identical on the nucleotide level. What would be the best way to align these genomes (with a pair-wise or multiple alignment) and identify:

  • Short variants: single and multiple nucleotide variations (SNP/MNP), indels
  • Long variants: longer deletions and insertions, inversions, duplications, translocations and so on

I am aware of MUMmer, Mauve and Mugsy, what other programs should I check? Would be great if they could produce a .VCF file as well.

Thanks.

SNP structural variant whole-genome alignment • 5.6k views
ADD COMMENT
2
Entering edit mode
5.9 years ago
kcamnairb ▴ 40

NucDiff looks interesting. I haven't been able to try it yet, but I know it can output a vcf.

ADD COMMENT
0
Entering edit mode

NucDiff looks interesting for sure, will give it a go, thanks.

ADD REPLY
0
Entering edit mode

Thank you for recommending NucDiff, it does exactly what I wanted.

ADD REPLY
1
Entering edit mode
5.9 years ago
Tm ★ 1.1k

For structural variations analysis, you can try Assemblytics which takes .delta file generated from NUCmer (NUCleotide MUMmer) as input.

ADD COMMENT
1
Entering edit mode

@toral

What about short variants? How would a genome aligner be able to identify snp's?

ADD REPLY
0
Entering edit mode

According to me, the best way to identify short variants is by mapping reads of one bacteria to the reference genome/scaffolds of another bacteria using samtools/GATK pipeline.

ADD REPLY
1
Entering edit mode

Well that does not answer my question. But anyways as the OP has long read data, its fairly difficult to identity snp's considering relatively high sequencing error rate

ADD REPLY
0
Entering edit mode

Olson ND, Lund SP, Colman RE, et al. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Frontiers in Genetics. 2015;6:235. doi:10.3389/fgene.2015.00235.

Calling SNPs using a genome assembly

SNPs can be identified from genome assemblies, however, since coverage is 1x at each position in an assembly, spurious SNPs cannot be filtered due to insufficient coverage, nor can contaminating genomes be identified and subsequently removed. For individual genes, SNPs are identified by extracting alignments using BLASTN (Altschul et al., 1990) followed by pairwise alignment of the SNPs. For whole genome assemblies, SNPs are typically identified from whole genome alignments made with software such as MUMmer (Kurtz et al., 2004), Mugsy (Angiuoli and Salzberg, 2011), and Mauve (Darling et al., 2004). Software has also been developed for the identification of SNPs from genome assemblies for whole genome phylogenetics including kSNP (Gardner and Hall, 2013) and parSNP (Treangen et al., 2014). SNP identification using assemblies is useful when analyzing individual genes, processing huge datasets, or if raw reads are unavailable. However, when using assemblies for SNP discovery, SNPs cannot be evaluated and verified with the underlying raw read data.

Long-read sequencing quality is improving, I am actually thinking that short-read sequencing will be completely replaced by long-read sequencing in the next 5-10 years. That's why I am interested in whole-genome comparisons. As regards the identification of short variants, while I realise that comparison of genomes/contigs/scaffolds/consensuses is less reliable amd I agree that reads mapping followed by GATK or FreeBayes SNP calling is probably the best method for identification of SNPs, whole-genome comparisons are useful in some situations (absence of raw reads, for example).

ADD REPLY
0
Entering edit mode

Thanks, haven't heard about it, will check later.

ADD REPLY
0
Entering edit mode
4.7 years ago

can i know the answer after 14 month

What would be the best way to align these genomes (with a pair-wise or multiple alignments) and identify?

it will help me in my objective.

ADD COMMENT
0
Entering edit mode

Hi, did you find your answer?

ADD REPLY
0
Entering edit mode

Could I know the solution after 22 months ? lol

ADD REPLY

Login before adding your answer.

Traffic: 1985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6