Why cd-hit-est not work when sequence identity threshold<0.95?
1
0
Entering edit mode
21 months ago
JZX • 0

The fasta file is about 500M, The longest is about 100000, Memory=60G CPU=16 cores

It runs normal when -c = 0.95 or 0.99 But it became very slow when -c =0.9, and the CPU and the speed of the hard disk show it didn't work.

cd-hit-est -M 60000 -T 16 -c 0.9 -n 8 -g 0 -i input -o output

genome cluster sequence • 734 views
ADD COMMENT
0
Entering edit mode
7 weeks ago
weidonglu • 0

The time for cd-hit-est analysis with -c =0.9 is much longer those with -c = 0.99 and 0.95. Be patient, wait for the analysis to finish.

ADD COMMENT
0
Entering edit mode

So, why? I have 18GB of data, and when -c=0.95, the run ends in about two days, while when -c=0.9, it has been three days and only a little bit has been done.

ADD REPLY
0
Entering edit mode

The reason for your problem was most likely due to you relative complex data and too little system memory. For 20000-30000 transcriptome sequences in my case, the run ends in 5-10 minutes when -c=0.95, and 6-10 h when -c=0.90. 18 GB of data is too large for cd-hit-est!

ADD REPLY

Login before adding your answer.

Traffic: 2421 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6