I want to create the phylogeny of the sorotypes of a virus. I retrieved the sequences from NCBI, and this is what I have:
S1: 200 sequences S2: 87 sequences S3: 549 sequences S4: 8 sequence S5: 17 sequences
Should I align all these sequences to create a tree or should I choose one of each sorotypes?
Moreover, I chose another virus of the same family to be the outgroup. Should I align it together the other sequences?