Ensembl Homology REST requests - possible to download whole database and query locally?
0
0
Entering edit mode
18 months ago
ngarber ▴ 60

I'm querying the Ensembl Homology REST database (https://rest.ensembl.org/homology/id/) with a list of genes to get their homologs, but my list of IDs is pretty long, so this takes quite a while. I'm doing the requests in Python, which is the only language I work in... alas, I know there is a Perl API, but I have no idea how to use it.

Is there a way to download all entries in the Ensembl Homology REST database and then query them locally?

Here is my code as it currently stands, which requests one entry at a time, since I believe REST can't accept requests for multiple genes (but please correct me if I'm wrong). Hopefully there is a way to do this locally...

import requests
import pandas as pd
import time

gene_id_list = data_df[ensembl_gene_col].tolist() #data_df is generated elsewhere and contains a list of genes and data
gene_id_list = list(dict.fromkeys(gene_id_list)) #removes duplicates

rest_server = "https://rest.ensembl.org"
rest_ext = "/homology/id/"
rest_suffix = "?"

gene_homologies_dict = {}
for i, gene_id in enumerate(gene_id_list):
    if gene_id != "None": 
        print("Retrieving homology data for", gene_id, "(" + str(i) + " of " + str(len(gene_id_list)) + ")")
        query_url = rest_server + rest_ext + gene_id + rest_suffix
        response = requests.get(query_url, headers = {"Content-Type" : "application/json"})
        if not response.ok: 
            response.raise_for_status()
        decoded = response.json()

        decoded_data = decoded.get("data")
        if len(decoded_data) == 0: 
            decoded_data = {}
            homologies = []
        elif len(decoded_data) == 1: 
            decoded_data = decoded_data[0]
            homologies = decoded_data.get("homologies")
        else: 
            raise Exception("For " + gene_id + " in gene_id_list, decoded_data length was " + str(len(decoded_data)) + " (expected: 1)")

        print("\t... retrieved! Data length:", len(homologies))

        gene_homologies_dict[gene_id] = homologies
        time.sleep(0.2)
python REST homology biomart ensembl • 695 views
ADD COMMENT
1
Entering edit mode

possible to download whole database and query locally?

http://ftp.ensembl.org/pub/current_compara/

ADD REPLY
0
Entering edit mode

So for looking at homologs of human proteins, do I want the following file?

http://ftp.ensembl.org/pub/current_compara/conservation_scores/91_mammals.gerp_conservation_score/gerp_conservation_scores.homo_sapiens.GRCh38.bw

And if so, what do I do with a bigWig file? I've never worked with those before. Don't they just contain genomic data? It's protein homologs I want...

ADD REPLY
0
Entering edit mode

Homologies can be found in the following directory on the Ensembl FTP: http://ftp.ensembl.org/pub/current_emf/ensembl-compara/homologies/

ADD REPLY

Login before adding your answer.

Traffic: 2507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6