Question: Downloading genomes from OmaDB with PyOMADB or OMA API
0
gravatar for hannah.muelbaier
25 days ago by
hannah.muelbaier0 wrote:

Hello OMA team,

I want to write a tool which downloads sequences automatically from OmaDB. Therefore I wanted to use the new PyOMADB library. I need not only some protein sequences although I need the genome protein sequence of some species. In the documentation of PyOMADB I wasn’t apple to find a command to download multiple fasta sequences (for example all proteins from one species) at once. Is it possible to get the genomes over PyOMADB or the OMA API or is there any possibility to download many sequences with one command or only a few?

Thank you and kind regards Hannah

oma orthologs • 94 views
ADD COMMENTlink modified 24 days ago by adrian.altenhoff610 • written 25 days ago by hannah.muelbaier0
2
gravatar for adrian.altenhoff
24 days ago by
Switzerland
adrian.altenhoff610 wrote:

Dear Hannah,

there is not a way to directly load whole genomes in fasta format using the PyOMADB tool. However, you can load multiple proteins at once:

import omadb
c = omadb.Client()
ids = ["HUMAN{:05d}".format(x) for x in range(1, 55)]
c.proteins.info(ids)

Note that the number of proteins you can query at once is currently limited to 100. If you would like to download all the sequences from a genome, you could do something like this:

genome = c.genomes.genome("ECOLI")
nr_ecoli_proteins = genome['nr_entries']
prot_ids = ["ECOLI{:05d}".format(x) for x in range(1,nr_ecoli_proteins+1)]
res = []
for x in range(0, len(prot_ids), 100):
    chunk = prot_ids[x:x+100]
    res.extendc.proteins.info(chunk))

Best wishes, Adrian

ADD COMMENTlink written 24 days ago by adrian.altenhoff610

Maybe important to mention that for this exact application of downloading for one or more whole genomes a fasta file of protein sequences, it is much more efficient to use the download link for all protein sequences in fasta format and filter for the genomes of interest. The URL for the latest version of protein sequences is https://omabrowser.org/All/oma-seqs.fa.gz accessible from the Download menu.

ADD REPLYlink written 19 days ago by adrian.altenhoff610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 816 users visited in the last hour