Bray-Curtis dissimilarity for comparing genomes, how to normalize?
0
0
Entering edit mode
14 months ago
Buffo ★ 1.9k

Hi everyone,

I am comparing whole genomes of a closely related species using Nucmer. I want to calculate the %of similarity based on the bases aligned from each genome using Bray-Curtis dissimilarity (I had also tried Sorensen coefficient). The problem is that some genomes have very different genome sizes (let's say ranging from 40 Mb to 170 Mb). In those cases, I get similarity values above 1 (or 100%) which is impossible. I have tried some normalizations as those recommended by Somerfield 2008, and Yoshioka (2008) but nothing worked.

Some suggestions? alternatives?

genome mummer bray-curtis normalization sorensen • 472 views
1
Entering edit mode

I suggest you try FastANI. It compares sequences without actual alignment by calculating k-mer similarity, which in most cases is related to sequence identity. The upside for your purposes is that sequences of different lengths can be used.

1
Entering edit mode

FastANI is faster than others programs such as Nucmer but it has a few disadvantages such as:

.- Is not recommended for fragmented or short genomes (N50 be ≥10 Kbp).
.- No ANI output is reported for a genome pair if ANI value is much below 80%.


But in my case would be a good alternative. Thanks.

Traffic: 2163 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.