Question

Ensembl API, command-line argument

0

Entering edit mode

7.1 years ago

oars ▴ 200

I'm experimenting with the Ensembl API and trying to write a script where I can specify a gene (using the Ensembl ID) via a command line argument. Specifically, I'm trying to extract the CDS sequence for each transcript associated with a gene provided via the command-line argument.

From the Rest API website, I found the following script for locating the CDS sequence:

import requests, sys

server = "http://rest.ensembl.org"
ext = "/sequence/id/ENST00000288602?type=cds"

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()


print r.text

The above code works perfect; however, I cannot get the command line argument version to quite work, so far this is what I've got:

import requests, sys

server = "http://rest.ensembl.org"
ext = "/sequence/id/gene?type=cds"

gene=sys.argv[1]

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()


print r.text

I think I'm close? Maybe not??

My command-line argument is simply:

$ python file.py Ensembl gene id (i.e. ENSG00000186642)

ensembl API bash python CDS • 2.5k views

ADD COMMENT • link updated 7.1 years ago by Alex Reynolds 35k • written 7.1 years ago by oars ▴ 200

score 3 · Accepted Answer · 2017-09-23

3

Entering edit mode

7.1 years ago

Alex Reynolds 35k

Perhaps try:

import requests, sys, errno

server = "http://rest.ensembl.org"

gene=sys.argv[1]

if not gene:
  sys.exit(errno.EINVAL)

ext = "/sequence/id/%s?type=cds" % (gene)

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()


print r.text

About errno: https://docs.python.org/2/library/errno.html

About Python string formatting: https://pyformat.info/

ADD COMMENT • link 7.1 years ago by Alex Reynolds 35k

0

Entering edit mode

Fantastic, many thanks for the reply and the references. This performed beautifully for ENSG00000169174; however, some other Ensembl ID's throw the following error:

Traceback (most recent call last):
  File "WEEK10.py", line 15, in <module>
    r.raise_for_status()
  File "/Users/oars/anaconda/lib/python2.7/site-packages/requests/models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://rest.ensembl.org/sequence/id/ENSG00000186642?type=cds

Perhaps this is coming from Ensembl's API and not with the structure of the script?

ADD REPLY • link 7.1 years ago by oars ▴ 200

1

Entering edit mode

Not sure if you need to query a different way?

$ wget --header="Content-Type:text/x-fasta" http://rest.ensembl.org/sequence/id/ENSG00000186642
>ENSG00000186642 chromosome:GRCh38:11:72576141:72674591:-1
GTTTATCTCTCAGTCTCTCTGTCTGTGAGTCTTTTTTCCTCTCTCCCAGTCAGACTCTCT
CTCTACCCCTCCCTCTCTCCCTCTCTCCCTCTCTGTCTGGGCCTCTCTCTGTTCCTCCTC
...
GTGAAGGTGTCTCCAACAGGCTTGATGTGTAGGCATTATTGTAAGTTTGCAACTTCTTGG

I don't really grok Ensembl, but there's someone on here who can probably help you with debugging their REST API.

ADD REPLY • link 7.1 years ago by Alex Reynolds 35k

0

Entering edit mode

Thanks again Alex. I think my issue was that gene id's (often) won't map to a CDS sequence. Instead, I switched all references in the code from gene to transcript_id. Everything seems to work!

import requests, sys, errno

server = "http://rest.ensembl.org"

transcript_id=sys.argv[1]

if not transcript_id:
  sys.exit(errno.EINVAL)

ext = "/sequence/id/%s?type=cds" % (transcript_id)

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()

print r.text

ADD REPLY • link 7.1 years ago by oars ▴ 200