Entering edit mode
3.5 years ago
Buffo
★
2.3k
Hi everyone,
I am comparing whole genomes of a closely related species using Nucmer. I want to calculate the %of similarity based on the bases aligned from each genome using Bray-Curtis dissimilarity (I had also tried Sorensen coefficient). The problem is that some genomes have very different genome sizes (let's say ranging from 40 Mb to 170 Mb). In those cases, I get similarity values above 1 (or 100%) which is impossible. I have tried some normalizations as those recommended by Somerfield 2008, and Yoshioka (2008) but nothing worked.
Some suggestions? alternatives?
I suggest you try FastANI. It compares sequences without actual alignment by calculating k-mer similarity, which in most cases is related to sequence identity. The upside for your purposes is that sequences of different lengths can be used.
FastANI is faster than others programs such as Nucmer but it has a few disadvantages such as:
But in my case would be a good alternative. Thanks.