How can I
You can use cdhit for clustering related sequences (based on sequence identity) . Identify the clusters, identify the sequences for each cluster and iterate motif finding tools on each cluster
You might consider using mash distances and define a cutoff sequence similarity.
Mash distances inherently use kmer distributions I believe, so you’d go a long way to addressing all these points at once with that approach.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy