Question

Downloading genomes from OmaDB with PyOMADB or OMA API

0

Entering edit mode

5.0 years ago

hannah.muelbaier • 0

Hello OMA team,

I want to write a tool which downloads sequences automatically from OmaDB. Therefore I wanted to use the new PyOMADB library. I need not only some protein sequences although I need the genome protein sequence of some species. In the documentation of PyOMADB I wasn’t apple to find a command to download multiple fasta sequences (for example all proteins from one species) at once. Is it possible to get the genomes over PyOMADB or the OMA API or is there any possibility to download many sequences with one command or only a few?

Thank you and kind regards Hannah

OMA orthologs • 1.2k views

ADD COMMENT • link updated 5.0 years ago by Adrian Altenhoff ★ 1.1k • written 5.0 years ago by hannah.muelbaier • 0

score 2 · Accepted Answer · 2019-05-02

2

Entering edit mode

5.0 years ago

Adrian Altenhoff ★ 1.1k

Dear Hannah,

there is not a way to directly load whole genomes in fasta format using the PyOMADB tool. However, you can load multiple proteins at once:

import omadb
c = omadb.Client()
ids = ["HUMAN{:05d}".format(x) for x in range(1, 55)]
c.proteins.info(ids)

Note that the number of proteins you can query at once is currently limited to 100. If you would like to download all the sequences from a genome, you could do something like this:

genome = c.genomes.genome("ECOLI")
nr_ecoli_proteins = genome['nr_entries']
prot_ids = ["ECOLI{:05d}".format(x) for x in range(1,nr_ecoli_proteins+1)]
res = []
for x in range(0, len(prot_ids), 100):
    chunk = prot_ids[x:x+100]
    res.extendc.proteins.info(chunk))

Best wishes, Adrian

ADD COMMENT • link 5.0 years ago by Adrian Altenhoff ★ 1.1k

0

Entering edit mode

Maybe important to mention that for this exact application of downloading for one or more whole genomes a fasta file of protein sequences, it is much more efficient to use the download link for all protein sequences in fasta format and filter for the genomes of interest. The URL for the latest version of protein sequences is https://omabrowser.org/All/oma-seqs.fa.gz accessible from the Download menu.

ADD REPLY • link 5.0 years ago by Adrian Altenhoff ★ 1.1k