I've performed a core SNP analysis on a set of E. coli isolates and now my question is rather phylosophical. Which would you say is the cutoff for calling a sequence/clade different from the other. For example, in a more wet lab approach, while performing a PFGE, I would consider a 90% similarity to be the cutoff. With the case of SNP I believe is much more difficult to set this break-point since (Please, correct me if I'm wrong):
1) SNP phylogeny has a much higher resolution.
2) Due to higher resolution, there are inevitable inherent errors.
3) The amount of SNP that can be found is not constant in any species/isolates (Or it might be but within a significant range)
Any opinions/discussions would be very welcome!