CD-HIT minimum sequence length
1
0
Entering edit mode
6.2 years ago
Swatchpuppy ▴ 50

I have run cd-hit on my machine with a set of 1600 sequences, however the program says that only 1523 were read. I have managed to check which sequences were left out (all with less than 15 aa). I have looked all over the documentation and i can't find a minimum sequence length allowed. Is there any way to overcome this limitation?

sequence clustering cd-hit • 2.3k views
3
Entering edit mode
6.2 years ago
SES 8.5k

The minimum is 10 for cd-hit.

-l    length of throw_away_sequences, default 10

Though, there are a lot of other length thresholds you can set, so you might want to check the results against the defaults. For example, there are length difference cutoffs and alignment length thresholds. Those might also be influencing the results.