Entering edit mode

6.9 years ago

jeccy.J
▴
60

Can anyone suggest me how to do clustering a set of bacterial genome based on their hamming or snp distance ?

How to do clustering of bacteria genome based on hamming distance.

1

Entering edit mode

6.9 years ago

jeccy.J
▴
60

Can anyone suggest me how to do clustering a set of bacterial genome based on their hamming or snp distance ?

0

Entering edit mode

0

Entering edit mode

It would be pointless to apply cd-hit to complete bacterial genome sequences (unless they were very similar sharing the same exact gene order and stuff). Perhaps a better strategy would be to build a distance matrix with e.g. all-vs-all MUMmer. Counting shared k-mers could also result in a relatively representative distance matrix..

Similar Posts

Loading Similar Posts

Traffic: 1048 users visited in the last hour

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

More detail is really needed. What exactly is your problem? How to calculate Hamming distance or SNP for two genomes? Which clustering algorithm to use once you've calculated the Hamming distances? Must it be Hamming or SNP distance, or are you in fact looking for distance metrics better suited for the problem you are trying to solve? How closely related are the genomes you want to cluster?

just get a matrix of distances MxN and use simple ward clustering or you could even try MDS. Both done in R ward clustering with manhattan distance for example:

pvclust(data = t(mydata),method.hclust = "ward.D",method.dist = "manhattan",nboot = 10000)

additionally you will get p-value for each clade as the number of replicated clusters