Question

Why is my Uniprot API request working but failing to get back

0

Entering edit mode

12 months ago

tidalArms • 0

I am trying to run some commands in Python using the urllib library in order to access the API of the Uniprot protein database. All the code does is send in the Uniprot IDs and get back the pertinent gene names (e.g. P08603 would correspond with the gene name of CFH). The code has been working fine for all my other projects, but I am having difficulty with this one new input, even though it has the same format of content as previous projects.

import re
import pandas as pd
import time
from urllib.parse import urlparse, parse_qs, urlencode
import requests
from requests.adapters import HTTPAdapter, Retry

POLLING_INTERVAL = 3

API_URL = "https://rest.uniprot.org"

retries = Retry(total=5, backoff_factor=0.25, status_forcelist=[500, 502, 503, 504])
session = requests.Session()
session.mount("https://", HTTPAdapter(max_retries=retries))

retries = Retry(total=5, backoff_factor=0.25, status_forcelist=[500, 502, 503, 504])
session = requests.Session()
session.mount("https://", HTTPAdapter(max_retries=retries))

def submit_id_mapping(from_db, to_db, ids):
    r = requests.post(
        f"{API_URL}/idmapping/run",
        data={"from": from_db, "to": to_db, "ids": ",".join(ids)},
     )
    r.raise_for_status()
    print(r)
    return r.json()["jobId"]


def get_id_mapping_results_link(job_id):
    url = f"{API_URL}/idmapping/details/{job_id}"
    r = session.get(url)
    r.raise_for_status()
    return r.json()["redirectURL"]


def check_id_mapping_results_ready(job_id):
    while True:
        r = session.get(f"{API_URL}/idmapping/status/{job_id}")
        r.raise_for_status()
        j = r.json()
        if "jobStatus" in j:
            if j["jobStatus"] == "RUNNING":
                print(f"Retrying in {POLLING_INTERVAL}s")
                time.sleep(POLLING_INTERVAL)
            else:
                raise Exception(r["jobStatus"])
        else:
            return bool(j["results"] or j["failedIds"])

def combine_batches(all_results, batch_results, file_format):
    if file_format == "json":
        for key in ("results", "failedIds"):
            if batch_results[key]:
                all_results[key] += batch_results[key]
    else:
        return all_results + batch_results
    return all_results

def decode_results(response, file_format):
    if file_format == "json":
        return response.json()
    elif file_format == "tsv":
        return [line for line in response.text.split("\n") if line]
    return response.text


def get_id_mapping_results_search(url):
    parsed = urlparse(url)
    query = parse_qs(parsed.query)
    file_format = query["format"][0] if "format" in query else "json"
    if "size" in query:
        size = int(query["size"][0])
    else:
        size = 500
        query["size"] = size
    parsed = parsed._replace(query=urlencode(query, doseq=True))
    url = parsed.geturl()
    r = session.get(url)
    r.raise_for_status()
    results = decode_results(r, file_format)
    total = int(r.headers["x-total-results"])
    print_progress_batches(0, size, total)
    for i, batch in enumerate(get_batch(r, file_format)):
        results = combine_batches(results, batch, file_format)
        print_progress_batches(i + 1, size, total)
    return results


def map_uniprot_identifiers(list_ids, from_id='UniProtKB_AC-ID', to_id='Gene_Name'):
    mapping_dict = {}
    try:
        job_id = submit_id_mapping(from_db=from_id, to_db=to_id, ids=list_ids)
        print(job_id)
        if check_id_mapping_results_ready(job_id):
            link = get_id_mapping_results_link(job_id)
            print(link)
        results = get_id_mapping_results_search(link)
        results = pd.DataFrame(results['results'])
        print(results)
        mapping_dict = dict(zip(results['from'], results['to']))
        print(mapping_dict)
    except Exception as err:
        print("the error is:")
        print(err)

    return mapping_dict

While the first function submit_id_mapping() works and the line print(job_id) yields a URL that can be copied and pasted into a browser to bring up a page with all the necessary information (e.g. https://rest.uniprot.org/idmapping/results/afcfe615e6a85f09c95a8734b708abca1cce78ce), but the results dataframe that results from get_id_mapping_results_search() is completely empty, which returns an empty mapping_dict. Given that the relevant information is clearly available on the site, I don't know why this function is not working. I have checked the input and its formatting is properly set up. I have tried more debugging in the function get_id_mapping_results_search(), but I can't seem to find where the problem is occurring. I have also implemented similar debugging approaches using a different input that did work in the past, but I also have been unable to pinpoint the issue. The only error I keep getting is that I have 'failedIds', which makes no sense, considering that the Uniprot IDs that my input provides does indeed have pertinent gene names in Uniprot (the URL I provided above is proof of this). So why can't I map the Uniprot IDs to their gene names?

uniprot python api-request rest-api • 1.1k views

ADD COMMENT • link updated 12 months ago by Istvan Albert 100k • written 12 months ago by tidalArms • 0

1

Entering edit mode

Is this your code exactly?

You have an indentation problem for return mapping_dict in what you've shared.
You use results = pd.DataFrame but don't import pandas.
name check_id_mapping_results_ready is not defined
name get_id_mapping_results_link is not defined
name decode_results is not defined

Provide examples of how you use your functions with a minimal working example code. And hopefully one that produces the issue.

Be up front and clear about the source of the code if you didn't draft it. Note in this post the first sentence points at the source code very similar to yours.

Example of MWE:

Code:

import re
import time
from urllib.parse import urlparse, parse_qs, urlencode
import requests
from requests.adapters import HTTPAdapter, Retry
import pandas as pd

POLLING_INTERVAL = 3

API_URL = "https://rest.uniprot.org"


retries = Retry(total=5, backoff_factor=0.25, status_forcelist=[500, 502, 503, 504])
session = requests.Session()
session.mount("https://", HTTPAdapter(max_retries=retries))

def check_response(response):
    try:
        response.raise_for_status()
    except requests.HTTPError:
        print(response.json())
        raise


def submit_id_mapping(from_db, to_db, ids):
    r = requests.post(
        f"{API_URL}/idmapping/run",
        data={"from": from_db, "to": to_db, "ids": ",".join(ids)},
     )
    r.raise_for_status()
    print(r)
    return r.json()["jobId"]

def get_id_mapping_results_link(job_id):
    url = f"{API_URL}/idmapping/details/{job_id}"
    request = session.get(url)
    check_response(request)
    return request.json()["redirectURL"]

def print_progress_batches(batch_index, size, total):
    n_fetched = min((batch_index + 1) * size, total)
    print(f"Fetched: {n_fetched} / {total}")

def get_id_mapping_results_search(url):
    parsed = urlparse(url)
    query = parse_qs(parsed.query)
    file_format = query["format"][0] if "format" in query else "json"
    if "size" in query:
        size = int(query["size"][0])
    else:
        size = 500
        query["size"] = size
    parsed = parsed._replace(query=urlencode(query, doseq=True))
    url = parsed.geturl()
    print(url)
    r = session.get(url)
    print(r)
    r.raise_for_status()
    results = decode_results(r, file_format, False)
    total = int(r.headers["x-total-results"])
    print_progress_batches(0, size, total)
    for i, batch in enumerate(get_batch(r, file_format, False)):
        results = combine_batches(results, batch, file_format)
        print_progress_batches(i + 1, size, total)
    return results

def get_batch(batch_response, file_format, compressed):
    batch_url = get_next_link(batch_response.headers)
    while batch_url:
        batch_response = session.get(batch_url)
        batch_response.raise_for_status()
        yield decode_results(batch_response, file_format, compressed)
        batch_url = get_next_link(batch_response.headers)

def check_id_mapping_results_ready(job_id):
    while True:
        request = session.get(f"{API_URL}/idmapping/status/{job_id}")
        check_response(request)
        j = request.json()
        if "jobStatus" in j:
            if j["jobStatus"] == "RUNNING":
                print(f"Retrying in {POLLING_INTERVAL}s")
                time.sleep(POLLING_INTERVAL)
            else:
                raise Exception(j["jobStatus"])
        else:
            return bool(j["results"] or j["failedIds"])

def get_next_link(headers):
    re_next_link = re.compile(r'<(.+)>; rel="next"')
    if "Link" in headers:
        match = re_next_link.match(headers["Link"])
        if match:
            return match.group(1)


def decode_results(response, file_format, compressed):
    if compressed:
        decompressed = zlib.decompress(response.content, 16 + zlib.MAX_WBITS)
        if file_format == "json":
            j = json.loads(decompressed.decode("utf-8"))
            return j
        elif file_format == "tsv":
            return [line for line in decompressed.decode("utf-8").split("\n") if line]
        elif file_format == "xlsx":
            return [decompressed]
        elif file_format == "xml":
            return [decompressed.decode("utf-8")]
        else:
            return decompressed.decode("utf-8")
    elif file_format == "json":
        return response.json()
    elif file_format == "tsv":
        return [line for line in response.text.split("\n") if line]
    elif file_format == "xlsx":
        return [response.content]
    elif file_format == "xml":
        return [response.text]
    return response.text

def map_uniprot_identifiers(list_ids, from_id='UniProtKB_AC-ID', to_id='Gene_Name'):
    mapping_dict = {}
    try:
        job_id = submit_id_mapping(from_db=from_id, to_db=to_id, ids=list_ids)
        print(job_id)
        time.sleep(0.5)
        if check_id_mapping_results_ready(job_id):
            link = get_id_mapping_results_link(job_id)
            print(link)
        results = get_id_mapping_results_search(link)
        results = pd.DataFrame(results['results'])
        print(results)
        mapping_dict = dict(zip(results['from'], results['to']))
        print(mapping_dict)
    except Exception as err:
        print("the error is:")
        print(err)
    return mapping_dict, results

How to use the code

Bring up a temporary Jupyter session in your browser, by clicking here.
Paste in the code above in a new notebook.

After running the code above, run the following:

the_returned_dict, df = map_uniprot_identifiers(["P10643","P11717","P00450","Q86VB7","P27169","P01871","P06727","O00299", "Q9UBX5", "B7ZKJ8","A0A0G2JPR0","P09493","P35443","Q9Y4F1","P23141", "Q8WWA0", "P04792", "P26447", "P07237", "P08571", "Q9UPN3", "P14151", "P49908", "P33151", "P26038"])

To see the dataframe, run in a cell df.
To see the returned dictionary, run in a cell the_returned_dict.

Easier alternative: use Unipressed, the Python package for querying UniProt's new REST API

Alternative option, using Unipressed, fully described at the top of this post:

from unipressed import IdMappingClient
import time
request = IdMappingClient.submit(
    source="UniProtKB_AC-ID", dest="Gene_Name", ids={"P10643","P11717","P00450","Q86VB7","P27169","P01871","P06727","O00299", "Q9UBX5", "B7ZKJ8","A0A0G2JPR0","P09493","P35443","Q9Y4F1","P23141", "Q8WWA0", "P04792", "P26447", "P07237", "P08571", "Q9UPN3", "P14151", "P49908", "P33151", "P26038"}
)
time.sleep(1)
results_list = list(request.each_result())
import pandas as pd
results_df = pd.DataFrame(results_list)

To see the dataframe, run in a cell results_df.
To see the raw list of id_mappings, run in a cell results_list.

ADD REPLY • link 12 months ago by Wayne ★ 2.0k

0

Entering edit mode

I have updated my code to fix indentation errors and post the missing code. I spent most of my time debugging the first code posted so I forgot to post other mentioned functions. But I will try out Unipressed tomorrow

ADD REPLY • link 12 months ago by tidalArms • 0

1

Entering edit mode

please do not delete the post once you get an answer, it is considered a misuse of the site - the person that answered that a question would have not done so had they known you delete the post,

ADD REPLY • link 12 months ago by Istvan Albert 100k