how to speed up cdhit clustering?
0
1
Entering edit mode
5.6 years ago
bitpir ▴ 240

I'm trying to run CDHIT to cluster ~250M of cds at nucleotide/protein levels. These are mostly NR-like sequences from NCBI. According to the paper it takes ~ 140 mins to cluster 4M seqs with 8 core. When I run the job, it took > 12 hours to process 1M seqs. I've tried increasing the #cpu to 24 but it still doesn't change the speed that much. Below are the commands that I used for running the clustering. Any help is appreciated! Thanks!

cd-hit-v4.6.8-2017-1208/cd-hit-est -I f1.nuc -o f1.nuc.out -n 10 -M 0 -T 8 -c 0.95 -r 0
cd-hit-v4.6.8-2017-1208/cd-hit -I f1.pep -o f1.pep.out -n 5 -M 0 -T 8 -c 0.95
cdhit protein clustering nucleotide clustering • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 2190 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6