I've done a bit of searching but haven't seen this specific problem raised (or answered) before. Apologies if it has.
I'm interested in a particular (bacterial) protein family. I have reason to believe the genomic context of these proteins will be interesting as well. So what I'd like to do is obtain every unique genomic context it's found in. That is, whether or not the protein is identical, if the genomic region surrounding the gene isn't the same, consider it unique.
So far, I've obtained a set of uniprot identifiers that match my HMM, and from those extracted protein IDs. Many of these are WP sequences, which means the identical protein is found in multiple genomes. I think I've found a way to use entrez to link from these redundant sequences to genome accessions, but it's sort of slow.
Is there a better way to do this? I feel like there must be, but I haven't been able to put it together.