usearch11 cluster_fast option
1
1
Entering edit mode
11 weeks ago
g.papp-co ▴ 10

I have a little example fasta file with protein fragments to figure out how usearch works:

fragments

>seq1
TKEHALSKERAA

>seq2
KKEHALSKERAR

>seq3
AAHAASAERAAE

>seq4
AAHAASAERAAS

I used usearch with the following options:

usearch -cluster_fast  ex.fasta -id  0.5 -uc cluster.uc

I expect at most 2 clusters (1,2) and (3,4) but the result contains only singletons.I decreased the identity and also the gap penalties but the result is the same.

Any idea?

Thank You

usearch proteins clustering • 179 views
ADD COMMENT
2
Entering edit mode
10 weeks ago
Mensur Dlakic ★ 13k

I think your sequences are too short. Try reducing the -minhsp value, but it simply may not work with these sequence lengths. If you duplicate each sequence and make the length 24 instead of 12, you will get the desired clusters.

ADD COMMENT

Login before adding your answer.

Traffic: 1710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6