Question: Download cdna sequences from OMA
0
gravatar for Biojl
22 days ago by
Biojl1.6k
Barcelona
Biojl1.6k wrote:

Hi, I am trying to obtain the cdna sequences for orthologs groups. There is a button for Download: Fasta but apparently it only works for the protein sequences. Does anybody know if there is a way to download them in bulk as opposed to download every single sequence individually?

oma orthologs • 104 views
ADD COMMENTlink modified 22 days ago by adrian.altenhoff440 • written 22 days ago by Biojl1.6k
1

have you looked here , there seems to be a download for all cDNA sequences

ADD REPLYlink written 22 days ago by lieven.sterck3.3k

Yes, this is what I did, but it would be nice to have a bulk download of cDNA for selected sequences, just as there is for proteins

ADD REPLYlink written 22 days ago by Biojl1.6k

but there is, no?

cDNA Eukaryotes:    Fasta format

which should be all the eukaryotic cDNAs

ADD REPLYlink written 22 days ago by lieven.sterck3.3k

There is not. I don't want to download a 2.4Gb compressed file and parse it every time I want the cDNA of a bunch of orthologs. There is an option to download all the ortholog protein sequences when you search a particular protein. I was looking for the same with cDNA.

ADD REPLYlink written 22 days ago by Biojl1.6k

aha, ok, true, got your issue now.

yet, if the IDs are consistent I would download the cDNA file once , blast format it and repeatedly query that one for the CDSs I need

ADD REPLYlink written 22 days ago by lieven.sterck3.3k
2
gravatar for adrian.altenhoff
22 days ago by
Switzerland
adrian.altenhoff440 wrote:

Dear Biojl,

currently there is no way to directly get the cds sequences from all the proteins in an OMA group or HOG from the web interface. However, there is a way how you can get them with quite little programmatic effort from the REST API. Here's a possible way how you could get them in python and output them as fasta:

import request, json
grp = 12345
group = json.loads(requests.get('https://omabrowser.org/api/group/{}/'.format(1345)).content.decode())
group_members_entries = [p['omaid'] for p in group['members']]

reply = requests.post('https://omabrowser.org/api/protein/bulk_retrieve/', json={"ids":group_member_entries})
group_members = json.loads(reply).content.decode())
for memb in group_members:
    print(">{}\n{}\n\n".format(memb['omaid'], memb['cdna']))

We also see that this might be generally a useful feature and will therefore implement it for the next release.

ADD COMMENTlink written 22 days ago by adrian.altenhoff440

Thanks you very much!

ADD REPLYlink written 21 days ago by Biojl1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1721 users visited in the last hour