I am conducting a homology search using HHBlits v3.3.0 against the Uniref30_2023_02 database. I am hoping to use the alignment generated in a3m format (-oa3m flag) while cross-referencing to information in the summary file in the hhr format (specifically I want to be able to get the score/E-val from the hhr file.)
However, I am noticing that there is not a clear correspondence between the sequences reported in the two formats. For example, my current a3m file has 537 sequences and the hhr file has 15 sequences. I understand I can get more sequences in the hhr file by changing various reporting thresholds, e.g. the -E flag. However, I noticed that not all the sequences in the hhr file are present in the a3m! So I am not sure what controls which sequences get output in the a3m vs. the hhr file.
Any insight into this would be useful! I have looked at this manual but I have not seen anything that seems to address this issue. Is the hhr just reporting representative cluster members while the a3m is reporting all sufficiently diverse cluster members? If so, why are some sequences in the hhr but not the a3m?
For reference, I am trying the current hhblits command:
hhblits -i [input_fasta] -o [output.hhr] -oa3m [output.a3m] -n 1 -d [PATH_TO_UNIREF32_2023_02] -Z 10000 -B 10000 -E 0.001