I've got 5 from 7 to 30 thousand virus genome sequences per each strain and I need to separate the sequences into groups based on the similarity of the sequences. How can I do that? By the way I'm able to align each strain with MAFFT, but i don't really know the way to cluster. I'd be really happy ot hear the answer
I've got 5 from 7 to 30 thousand virus genome sequences
You have 5 protein sequences from 30K genomes? 5-7 protein sequences? 5-7 genes?
If you are talking about whole genome clustering, that would not be easy on such a scale. I recommend that you use predicted proteins for each of them. Then:
align them individually
trim the alignments
concatenate those alignments into a super-matrix
make a phylogenetic tree
Beware that each of these steps, especially the last one, will take a long time. Also, there is a large potential for error when working on this scale, even for those who have already done all these steps before. Even if all of this works, it is very difficult to look through a tree that has 30K nodes. Lastly, most of your genomes will be (near-)identical at a protein level, so you still may not get much useful information.
I realize it is your username, but I feel like it should be riki-Miki-tavi.