Incremental DNA clustering using cd-hit-est
1
0
Entering edit mode
6.1 years ago

Hey,

I have a lot of short sequences (100 / 800 nt) as input file I want to cluster. Every several steps of the scripts that generates the input file, it adds about 1000 sequences and at the end, the file is finally 105M and it's hard to cluster. I want to know of there's a way to do incremental clustering each time sequences are added. I've read the cdhit wiki but found no information about incremental clustering for cd-hit-est. Any suggestions?

The sequences are small transposable elements called MITEs and I need to group them into families

cdhit clustering cluster dna • 1.4k views
ADD COMMENT
0
Entering edit mode
6.1 years ago
h.mon 35k

Try clumpify.sh from the BBTools package.

ADD COMMENT
0
Entering edit mode

I see that this tool is more for reads, I'll add further descriptions to my post

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6