I have a fasta file with the protein sequence of 48 genes. I would like to compare them all against each other and see the similarity between them. For example, Gene1 against all 48, Gene2 against all 48. I would like to make a matrix with the similarity between the genes. I tried to use the EMBOSS Needle tool, with the following code:
needle -outfile result.needle -asequence all-proteins.fasta -bsequence all-proteins.fasta -gapopen 10 -gapextend 0.5
But what it does is take the sequence of the first protein and compare it against the 48. Is there a way to do all against all at once, without having to put: Gene1 x all-proteins, Gene2 x all-proteins?
And another issue I noticed was that in one of the cases I don't get the same similarity between the genes. Example: Gene1 x Gene2 (9.6% similarity), but Gene2 x Gene1 (9.7% similarity).
Is there any way to calculate the similarity of all against all so that I get a single result and can do it in one go?
Thanks
See these prior threads for inspiration:
All vs. all pairwise sequence aligment
Make matrix of protein pairwise identities/similarities from multiple protein sequences
percent identities for all by all protein alignment
clustal omega
can calculate this matrix. Are the protein of equal/equivalent length?