Entering edit mode
8.4 years ago
dcdanko
•
0
I have a program which outputs ~50 assembled transcripts which are about 10k base pairs long each.
My program already filters exact duplicate sequences but many of the assembled transcripts are very similar to one another.
Is there any existing assembly program which can connect sequences that are identical over 90% of their length?
Have you tried CD-HIT? It can be used for clustering and comparing protein or nucleotide sequences.