Question: Ensembl API, command-line argument
0
gravatar for oars
12 weeks ago by
oars40
oars40 wrote:

I'm experimenting with the Ensembl API and trying to write a script where I can specify a gene (using the Ensembl ID) via a command line argument. Specifically, I'm trying to extract the CDS sequence for each transcript associated with a gene provided via the command-line argument.

From the Rest API website, I found the following script for locating the CDS sequence:

import requests, sys

server = "http://rest.ensembl.org"
ext = "/sequence/id/ENST00000288602?type=cds"

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()


print r.text

The above code works perfect; however, I cannot get the command line argument version to quite work, so far this is what I've got:

import requests, sys

server = "http://rest.ensembl.org"
ext = "/sequence/id/gene?type=cds"

gene=sys.argv[1]

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()


print r.text

I think I'm close? Maybe not??

My command-line argument is simply:

$ python file.py Ensembl gene id (i.e. ENSG00000186642)
cds bash python ensembl api • 208 views
ADD COMMENTlink modified 12 weeks ago by Alex Reynolds22k • written 12 weeks ago by oars40
3
gravatar for Alex Reynolds
12 weeks ago by
Alex Reynolds22k
Seattle, WA USA
Alex Reynolds22k wrote:

Perhaps try:

import requests, sys, errno

server = "http://rest.ensembl.org"

gene=sys.argv[1]

if not gene:
  sys.exit(errno.EINVAL)

ext = "/sequence/id/%s?type=cds" % (gene)

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()


print r.text

About errno: https://docs.python.org/2/library/errno.html

About Python string formatting: https://pyformat.info/

ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by Alex Reynolds22k

Fantastic, many thanks for the reply and the references. This performed beautifully for ENSG00000169174; however, some other Ensembl ID's throw the following error:

Traceback (most recent call last):
  File "WEEK10.py", line 15, in <module>
    r.raise_for_status()
  File "/Users/oars/anaconda/lib/python2.7/site-packages/requests/models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://rest.ensembl.org/sequence/id/ENSG00000186642?type=cds

Perhaps this is coming from Ensembl's API and not with the structure of the script?

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by oars40
1

Not sure if you need to query a different way?

$ wget --header="Content-Type:text/x-fasta" http://rest.ensembl.org/sequence/id/ENSG00000186642
>ENSG00000186642 chromosome:GRCh38:11:72576141:72674591:-1
GTTTATCTCTCAGTCTCTCTGTCTGTGAGTCTTTTTTCCTCTCTCCCAGTCAGACTCTCT
CTCTACCCCTCCCTCTCTCCCTCTCTCCCTCTCTGTCTGGGCCTCTCTCTGTTCCTCCTC
...
GTGAAGGTGTCTCCAACAGGCTTGATGTGTAGGCATTATTGTAAGTTTGCAACTTCTTGG

I don't really grok Ensembl, but there's someone on here who can probably help you with debugging their REST API.

ADD REPLYlink written 12 weeks ago by Alex Reynolds22k

Thanks again Alex. I think my issue was that gene id's (often) won't map to a CDS sequence. Instead, I switched all references in the code from gene to transcript_id. Everything seems to work!

import requests, sys, errno

server = "http://rest.ensembl.org"

transcript_id=sys.argv[1]

if not transcript_id:
  sys.exit(errno.EINVAL)

ext = "/sequence/id/%s?type=cds" % (transcript_id)

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()

print r.text
ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by oars40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1317 users visited in the last hour