I've been using CD-HIT EST in the past to cluster sequences of COI from environmental samples. For my current project, however, there are over 75 samples, and my supervisor wants to know if all of the sequences from all of the samples can be pooled, and then clustered, and then in the end we can know which sequences came from which sample. I tried searching, and I found that CD-HIT OTU does something like this with pooled sequences from multiple samples, but it's apparently only for 16s rRNA, and I have only eukaryotic organisms (COI). Could anyone recommend a clustering program for me?
Thank you for your time,