Is anybody aware of a tool that can cluster genomic sequences (e.g. contigs) while also breaking (splitting) sequences when needed?
Traditional clustering tools (CD-HIT, Blastclust) look at the overall similarity between sequences in order to decide whether they should be clustered together or not. What I'm looking for is a tool that can cluster together parts of sequences and leave out regions that are not similar. Here's an illustration.
Any idea if such a tool exists or how can this be achieved?