Question

Download cdna sequences from OMA

0

Entering edit mode

5.4 years ago

Biojl ★ 1.7k

Hi, I am trying to obtain the cdna sequences for orthologs groups. There is a button for Download: Fasta but apparently it only works for the protein sequences. Does anybody know if there is a way to download them in bulk as opposed to download every single sequence individually?

oma orthologs • 1.5k views

ADD COMMENT • link updated 5.4 years ago by Adrian Altenhoff ★ 1.1k • written 5.4 years ago by Biojl ★ 1.7k

1

Entering edit mode

have you looked here , there seems to be a download for all cDNA sequences

ADD REPLY • link 5.4 years ago by lieven.sterck 15k

0

Entering edit mode

Yes, this is what I did, but it would be nice to have a bulk download of cDNA for selected sequences, just as there is for proteins

ADD REPLY • link 5.4 years ago by Biojl ★ 1.7k

0

Entering edit mode

but there is, no?

cDNA Eukaryotes:    Fasta format

which should be all the eukaryotic cDNAs

ADD REPLY • link 5.4 years ago by lieven.sterck 15k

0

Entering edit mode

There is not. I don't want to download a 2.4Gb compressed file and parse it every time I want the cDNA of a bunch of orthologs. There is an option to download all the ortholog protein sequences when you search a particular protein. I was looking for the same with cDNA.

ADD REPLY • link 5.4 years ago by Biojl ★ 1.7k

0

Entering edit mode

aha, ok, true, got your issue now.

yet, if the IDs are consistent I would download the cDNA file once , blast format it and repeatedly query that one for the CDSs I need

ADD REPLY • link 5.4 years ago by lieven.sterck 15k

score 2 · Answer 1 · 2018-11-20

Dear Biojl,

currently there is no way to directly get the cds sequences from all the proteins in an OMA group or HOG from the web interface. However, there is a way how you can get them with quite little programmatic effort from the REST API. Here's a possible way how you could get them in python and output them as fasta:

import request, json
grp = 12345
group = json.loads(requests.get('https://omabrowser.org/api/group/{}/'.format(1345)).content.decode())
group_members_entries = [p['omaid'] for p in group['members']]

reply = requests.post('https://omabrowser.org/api/protein/bulk_retrieve/', json={"ids":group_member_entries})
group_members = json.loads(reply).content.decode())
for memb in group_members:
    print(">{}\n{}\n\n".format(memb['omaid'], memb['cdna']))

We also see that this might be generally a useful feature and will therefore implement it for the next release.