I'm experimenting with the Ensembl API and trying to write a script where I can specify a gene (using the Ensembl ID) via a command line argument. Specifically, I'm trying to extract the CDS sequence for each transcript associated with a gene provided via the command-line argument.
From the Rest API website, I found the following script for locating the CDS sequence:
import requests, sys
server = "http://rest.ensembl.org"
ext = "/sequence/id/ENST00000288602?type=cds"
r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})
if not r.ok:
r.raise_for_status()
sys.exit()
print r.text
The above code works perfect; however, I cannot get the command line argument version to quite work, so far this is what I've got:
import requests, sys
server = "http://rest.ensembl.org"
ext = "/sequence/id/gene?type=cds"
gene=sys.argv[1]
r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})
if not r.ok:
r.raise_for_status()
sys.exit()
print r.text
I think I'm close? Maybe not??
My command-line argument is simply:
$ python file.py Ensembl gene id (i.e. ENSG00000186642)
Fantastic, many thanks for the reply and the references. This performed beautifully for ENSG00000169174; however, some other Ensembl ID's throw the following error:
Perhaps this is coming from Ensembl's API and not with the structure of the script?
Not sure if you need to query a different way?
I don't really grok Ensembl, but there's someone on here who can probably help you with debugging their REST API.
Thanks again Alex. I think my issue was that gene id's (often) won't map to a CDS sequence. Instead, I switched all references in the code from gene to transcript_id. Everything seems to work!