I have a set of full length 16S rRNA sequences (~1500 nts) and I want to classify them in groups based on a identity threshold (98.5%, species circumscription threshold). What software do you recommend for this purpose?
PS1: I tried CD-HIT, but it resulted in some awkward groupings (ie. a species sequences were divided in different groups).
PS2: I have computed an identity matrix based on the alignment of 16S rRNA sequences. I realized that I could perform a hierarchical clustering and detect "islands" of values above 98.5% along the diagonal. Basically, the number of squares along the diagonal would represent the number of species present in my dataset. However, I am not sure if this is an appropriate approach.
I think you can try using QIIME, pick_OTUs.py script, where you can change Sequence similarity threshold from 0.97 (which is default) to 0.985.