Ensembl REST: How to retrieve CDS from a protein id?
2
2
Entering edit mode
7.4 years ago
harlan ▴ 20

The goal is simple, but I cannot figure out a way to make it work in any of the REST endpoints. How can I retrieve the CDS (DNA) corresponding to a Protein ID? Not relevant, but I am working in Python.

There are two approaches I thought would be possible:

  1. A direct retrieval using the "/sequence/id/" endpoint, with 'cds' as the 'type'.
  2. Identifying the corresponding transcript id from the protein id, and then using that transcript id in the "/sequence/id/" endpoint. I cannot find a way to identify the corresponding transcript id.

Neither of these approaches have been fruitful. This seems like such a simple/common need that surely I am just unaware of the proper way to get it done.

Any help would be much appreciated!

Ensembl REST • 3.4k views
ADD COMMENT
0
Entering edit mode

What protein ID do you have? Can you please give an example?

ADD REPLY
0
Entering edit mode

[edited]

Hi Bert!

Taking as input an Ensembl protein id 'ENSP00000430656' I would like to retrieve the corresponding transcript id, which is 'ENST00000523953'.

From there the goal is to retrieve the CDS which can be achieved by the '/Sequence/ID/' endpoint in the following way: 'http://rest.ensembl.org/sequence/id/ENST00000523953?content-type=application/json;type=cds'.

I just have not been able to find any way to retrieve a corresponding Transcript id using just the Protein id as input.

The best solution so far involves using the Gene id in an 'Overlap' query as such:

'http://rest.ensembl.org/overlap/id/ENSG00000133742?feature=gene;content-type=application/json;feature=cds'

This is the only location I have identified where both Protein id and corresponding Transcript id exist in the same entry.

-Harlan

ADD REPLY
1
Entering edit mode

I forwarded your question to one of the experts, i.e. Magali from the Ensembl team. Please see her answer below.

ADD REPLY
1
Entering edit mode
7.4 years ago

Hi Harlan,

Unfortunately, we currently do not provide a way to retrieve a transcript from its protein, only the other way around.

As you have correctly identified, you can use the overlap endpoint if you know the gene Ensembl stable id or the region

If you know the gene symbol, you can also use the lookup endpoint

You have an interesting use case though and we will look into ways of providing a better support for it, either by adding more fields to existing endpoints or creating a new one.

If you have suggestions as what would work for you, I would be happy to hear them.

Regards,
Magali

ADD COMMENT
0
Entering edit mode

Magali,

Thank you for the response. Because we are generally working within one protein family, and limited numbers of genes, using the Ensembl stable gene id should work in this instance. Thank you for providing the 'lookup' endpoint approach as well.

A suggestion for future use would be that any record that contains an Ensembl protein id or transcript id should also have the other as an additional attribute, as well as the parent Ensembl gene id. This would make every record accessible from all directions. Perhaps that is a bit of a tall order though?

Thanks again!

ADD REPLY
0
Entering edit mode

Following your suggestion, we have added a 'Parent' field for translation endpoints in our latest release, for example

http://rest.ensembl.org/overlap/translation/ENSP00000288602?content-type=application/json

Hope that helps and please do let us know about additional features.

ADD REPLY
0
Entering edit mode

Truly excellent Magali!  Thanks for incorporating this feature.  Hopefully it will be useful for others as well.

ADD REPLY
0
Entering edit mode
6.6 years ago
Tariq Daouda ▴ 210

Hi,

There's a python package called pyGeno that can do that for you. It can can retrieve anything from anything including proteins from transcripts. You simply have to do protein.transcript. The same for proteins from transcripts, proteins from genes, genes from proteins, exons from transcripts etc.. Any combination or genes, transcripts, exons, proteins, chromosomes and genomes is possible.

Cheers, 

ADD COMMENT

Login before adding your answer.

Traffic: 2156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6