Entering edit mode
7.0 years ago
igor
13k
You can run uniq -c
to count occurrences of unique strings. Is there a way to do that accounting for mismatches? If these are sequencing reads, then you may expect mismatches.
For example, for grep
, there is a agrep
("approximate GREP for fast fuzzy string searching").
Clumpify from BBMap? Count the clumps. I am not sure if @Brian ever added a feature to provide counts of reads for the individual clumps though I had asked him that as a feature request.
I think you can experiment with the command-line parameters of dedupe.sh or cd-hit to achieve what you want, or at least something close to it.