Downloading genomes from OmaDB with PyOMADB or OMA API
Entering edit mode
3.6 years ago

Hello OMA team,

I want to write a tool which downloads sequences automatically from OmaDB. Therefore I wanted to use the new PyOMADB library. I need not only some protein sequences although I need the genome protein sequence of some species. In the documentation of PyOMADB I wasn’t apple to find a command to download multiple fasta sequences (for example all proteins from one species) at once. Is it possible to get the genomes over PyOMADB or the OMA API or is there any possibility to download many sequences with one command or only a few?

Thank you and kind regards Hannah

OMA orthologs • 876 views
Entering edit mode
3.6 years ago

Dear Hannah,

there is not a way to directly load whole genomes in fasta format using the PyOMADB tool. However, you can load multiple proteins at once:

import omadb
c = omadb.Client()
ids = ["HUMAN{:05d}".format(x) for x in range(1, 55)]

Note that the number of proteins you can query at once is currently limited to 100. If you would like to download all the sequences from a genome, you could do something like this:

genome = c.genomes.genome("ECOLI")
nr_ecoli_proteins = genome['nr_entries']
prot_ids = ["ECOLI{:05d}".format(x) for x in range(1,nr_ecoli_proteins+1)]
res = []
for x in range(0, len(prot_ids), 100):
    chunk = prot_ids[x:x+100]

Best wishes, Adrian

Entering edit mode

Maybe important to mention that for this exact application of downloading for one or more whole genomes a fasta file of protein sequences, it is much more efficient to use the download link for all protein sequences in fasta format and filter for the genomes of interest. The URL for the latest version of protein sequences is accessible from the Download menu.


Login before adding your answer.

Traffic: 1406 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6