Very simple question – I have a list of, say, 1000 protein sequences in a fasta file. I want to compute percent identities for each pair of sequences. So, in this example, it'd generated a 1000 by 1000 matrix reflected over the diagonal.
I've tried diamond and blastp, but even when I set -max-target-seqs to 0 or set the evalue/min percent identity to appropriately high/low values, they still don't report all possible alignments. There is some sort of filtering still going on, it would appear. Hopefully I'm just missing some parameter?
Any recommendations for how to get this done quickly? I'd hard code it myself but I'd rather have one of these algorithms do it for me, as they'll presumably be faster at full scale.
Thanks in advance.