counting sequences with fuzzy uniq (with mismatches)
0
0
Entering edit mode
7.0 years ago
igor 13k

You can run uniq -c to count occurrences of unique strings. Is there a way to do that accounting for mismatches? If these are sequencing reads, then you may expect mismatches.

For example, for grep, there is a agrep ("approximate GREP for fast fuzzy string searching").

unix cli • 1.4k views
ADD COMMENT
0
Entering edit mode

Clumpify from BBMap? Count the clumps. I am not sure if @Brian ever added a feature to provide counts of reads for the individual clumps though I had asked him that as a feature request.

ADD REPLY
0
Entering edit mode

I think you can experiment with the command-line parameters of dedupe.sh or cd-hit to achieve what you want, or at least something close to it.

ADD REPLY

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6