I'm in the process of automating an assembly of paired multiple sequence alignments (i.e. MSAs with two proteins aligned after each other) in order to do some paired sequence processing on them.
In order to do this, I'm querying two protein families from pfam, and I'm trying to associate them.
I've understood that asserting that the genomic locations of the two proteins are adjacent is appropriate for associating them with high confidence in my case (since they're situated in the same operon)
So, this is the question:
Given the xml-information in uniprot, what is the best way to assert the genomic proximity/adjacency? (Can I find it easily using the BioPython API for example?)
In the case that the specific loci index information is lacking (i suspect this is often the case) it appropriate to compare the uniprot identifiers (e.g. K0D1W6 in http://www.uniprot.org/uniprot/K0D1W6.xml) for similarity using some measure?
Hope to get some intelligent mind out there to help me. I'd be forever grateful!