How to determine % similarity between genomes?
22 months ago
A_heath ▴ 120

Hi all,

I am aligning multiple bacterial genomes and I would like to know how I can obtained a % of identity between these genomes ?

Is that a function that can be displayed by either Mugsy or Mauve?

Audrey

genome alignement mugsy mauve % identity • 747 views
22 months ago
5heikki 11k

I recommend Mash

mash dist genome1.fna genome2.fna

I used mash and I have the following results :

Mygenome.fasta Close_genome_1.fasta 0.0196 0494/1000

Mygenome.fasta Close_genome_2.fasta 0.0174 530/1000

I do not really understand the meaning of the two scores.

In this case, which genome is closer? Genome 1 or 2?

Close_genome_2 is closer. ANI = 1 - mash distance, so here 1 - 0.0174 = 0.9826, i.e. 98.26% similarity. The last column displays the number of shared hashes (out of 1,000 by default). You can get more precise results if you sketch your genomes first with e.g. k-mer value of 17 and sketch size of 10,000 (mash sketch -k 17 -s 10000 input.fna) and then compare the resulting .msh files with mash dist

22 months ago
Carambakaracho ★ 3.1k

What you're probably looking for is average nucleotide identity (ANI).

This is a tool I ever wanted to test, but now it's not relevant for me anymore

More readings from my simple web search

https://www.sciencedirect.com/science/article/pii/S0580951714000087

https://img.jgi.doe.gov/docs/ANI.pdf