Get Uniprot entry name from PDB ID and chain (solved)
1
1
Entering edit mode
7.2 years ago

Hello every one,

I'm working with a file with a large number of PDB identifiers and for each of the identifiers I have two or more chains. What I want is to get a list with the Uniprot entry name corresponding to each of the chains.

I want an automatic way to iterate over a file in python to incorporate in an script that I have.

If someone can help I will thank a lot

Albert

Python • 7.1k views
1
Entering edit mode
7.2 years ago
zlira ▴ 80

There is such an endpoint at pdb: http://www.rcsb.org/pdb/software/rest.do see the part on "Third-party annotations and PDB to UniProtKB mapping".

Example mapping for 4hhb.A chain: http://www.rcsb.org/pdb/rest/das/pdb_uniprot_mapping/alignment?query=4hhb.A

You can make a request to that endpoin and parse out Uniprot accession id of a chain from xml that is returned. If you need any additional info (like protein name on Unirpot) you can use the accession id to do that.

Here is some sample code:

import requests
from xml.etree.ElementTree import fromstring

pdb_id = '4hhb.A'
pdb_mapping_url = 'http://www.rcsb.org/pdb/rest/das/pdb_uniprot_mapping/alignment'
uniprot_url = 'http://www.uniprot.org/uniprot/{}.xml'

def get_uniprot_accession_id(response_xml):
root = fromstring(response_xml)
return next(
el for el in root.getchildren()[0].getchildren()
if el.attrib['dbSource'] == 'UniProt'
).attrib['dbAccessionId']

def get_uniprot_protein_name(uniport_id):
uinprot_response = requests.get(
uniprot_url.format(uniport_id)
).text
return fromstring(uinprot_response).find(
'.//{http://uniprot.org/uniprot}recommendedName/{http://uniprot.org/uniprot}fullName'
).text

def map_pdb_to_uniprot(pdb_id):
pdb_mapping_response = requests.get(
pdb_mapping_url, params={'query': pdb_id}
).text
uniprot_id = get_uniprot_accession_id(pdb_mapping_response)
uniprot_name = get_uniprot_protein_name(uniprot_id)
return {
'pdb_id': pdb_id,
'uniprot_id': uniprot_id,
'uniprot_name': uniprot_name
}

print map_pdb_to_uniprot(pdb_id)


Result:

{'pdb_id': '4hhb.A', 'uniprot_id': 'P69905', 'uniprot_name': 'Hemoglobin subunit alpha'}

0
Entering edit mode

Wow that's amazing but is not what I really want. The uniprot entry name that I'm looking for has this format: HBA_HUMAN

EDIT: Thanks for the help

EDIT2: I solved the problem like this:

def get_uniprot_protein_name(uniport_id):
uinprot_response = requests.get(
uniprot_url.format(uniport_id)
).text
return fromstring(uinprot_response).find(
'.//{http://uniprot.org/uniprot}entry/{http://uniprot.org/uniprot}name'
).text

0
Entering edit mode

You can get that name by changing line 22 from:

'.//{http://uniprot.org/uniprot}recommendedName/{http://uniprot.org/uniprot}fullName'


to:

'.//{http://uniprot.org/uniprot}name'

0
Entering edit mode

Yep is more or less what i have done. I just want to thank you for your help (In this and other posts)

0
Entering edit mode

Hi i'm having a problem right now with the script.

The problem is the next:

When having as input a PDB code plus a chain, for instance 2VLJ.E on the PDB website appears to have a chain but in some cases the pdb chain is not linked to any Uniprot Entry Name.

I would like to know how to ignore the cases when a chain have no hits or to raise a warning without killing the program

0
Entering edit mode

Hi..! Thank you for such a nice code to get UniProt. Could you please edit the function to get the Organism of a given chain?