Word size CD-HIT-EST
1
0
Entering edit mode
2.1 years ago
Nathan ▴ 10

Hello. I am trying to cluster a huge fasta file containing using CD-HIT-EST with a threshold of 80%. According to the user's guide (http://www.bioinformatics.org/cd-hit/cd-hit-user-guide.pdf), I should use a word size (- n) of 5. However, it is taking forever. Could I change this parameter to -n 10 to speed up the process without changes in the final result, i. e., get the same result as -n 5?

This is my command:

cd-hit-est -i input -o output -d 0 -T 16 -g 0 -M 75000 -aL 0.97 -aS 0.97 -c 0.8 -n 5 -b 1
clustering cd-hit-est cd-hit • 1.4k views
ADD COMMENT
0
Entering edit mode
2.1 years ago
Mensur Dlakic ★ 27k

In a word, no. The word size of 5 is already an upper limit for clustering at 80% identity. You will have to get a faster computer with more threads and memory, work with a smaller database, or just be patient. It may help to know that the process is not linear as the largest sequences are clustered first, so it will speed up as it goes along.

ADD COMMENT
0
Entering edit mode

Okay. Thank you for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6