I have big alignments of RNA sequences (16-200 thousand sequences) and I need to cluster them by an identity threshold. Basically what I want to do is: - count identity of sequences distribution for these alignments, - after discovering this distribution, I would like to cluster these sequences by an identity threshold for example create file, with sequences from my current alignment with sequences that are identical at least at 50% and more, 60% and more ... and so on;
Just to clarify, I consider identity of sequence as number of positions that their nuclotides are identical for exaple:
seq1 ATA seq2 GTG seq1 and seq2 are identical at 33,3%.
My question is do You know any software or method that would help me to solve that issues?
Thank You all for reading this and for possible answers in advance.