usearch11 cluster_fast option
1
1
Entering edit mode
11 weeks ago
g.papp-co ▴ 10

I have a little example fasta file with protein fragments to figure out how usearch works:

fragments

>seq1
TKEHALSKERAA

>seq2
KKEHALSKERAR

>seq3
AAHAASAERAAE

>seq4
AAHAASAERAAS


I used usearch with the following options:

usearch -cluster_fast  ex.fasta -id  0.5 -uc cluster.uc


I expect at most 2 clusters (1,2) and (3,4) but the result contains only singletons.I decreased the identity and also the gap penalties but the result is the same.

Any idea?

Thank You

usearch proteins clustering • 179 views
2
Entering edit mode
10 weeks ago
Mensur Dlakic ★ 13k

I think your sequences are too short. Try reducing the -minhsp value, but it simply may not work with these sequence lengths. If you duplicate each sequence and make the length 24 instead of 12, you will get the desired clusters.