How can run cd-hit-est with a clstr threshold less than 0.8?
6.4 years ago
m.koohi.m ▴ 120

Dear friends,

I try to run cd-hit-est with a cluster threshold less than 0.8 but every time I get the following error:

Fatal Error: invalid clstr threshold, should >=0.8 Program halted !!

I tried:

cd-hit-est -i seq.fasta -o out.fasta  -d 0 -T 10 -g 1 -M 10000 -c 0.6 -n 4

This command does not have any problem with cd-hit. The following command works well:

cd-hit -i seq.fasta -o out.fasta  -d 0 -T 10 -g 1 -M 10000 -c 0.6 -n 4

Am I missing something?

Thank you

5.4 years ago
NPalopoli ▴ 290

According to the cd-hit-est manual you should use one of the following combinations of threshold (-c) and word size (-n):

-n 10, 11 for thresholds 0.95 ~ 1.0
-n 8,9 for thresholds 0.90 ~ 0.95
-n 7 for thresholds 0.88 ~ 0.9
-n 6 for thresholds 0.85 ~ 0.88
-n 5 for thresholds 0.80 ~ 0.85
-n 4 for thresholds 0.75 ~ 0.8

It escapes to me if lower thresholds are allowed and I don't have the correct input data at hand to try for myself (BTW, you should provide a sample dataset that would allow others to replicate the error).

29 days ago
Asaf 10k

For future reference

The 0.8 identity threshold for EST (nucleotides) is hardcoded. However, there's an option to use -D (distance) instead of -c (identity threshold). For some reason I couldn't find it in the documentation and couldn't figure out how it's being calculated.


