Get Exons & Introns Using Ensembl Rest Api
2
4
Entering edit mode
10.4 years ago
Gungor Budak ▴ 270

Hello all

I have some gene IDs from Ensembl and I want to get their transcripts' exons and introns (sequences) so that later I can determine exon/intron boundaries and do some analyses.

I've discovered Ensembl Rest API, which is a really easy and clean way of getting data and played around with it a bit. Using this API, I could get coding transcripts of the genes and then sequences of these transcripts. However, I couldn't find any way to distinguish exonic and intronic regions in these sequences.

Here is my script that gets sequences of transcripts of "ENSG00000197568" gene in FASTA format. And I want to get exons and introns like Ensembl gives us in here.

#!/usr/local/bin/python

import httplib2, sys, re, json

def check_response(response):
    if not response.status == 200:
        print "Invalid response: ", response.status
        sys.exit()

http = httplib2.Http(".cache")
server = "http://beta.rest.ensembl.org/"
gene_id = "ENSG00000197568"
query = "sequence/id/" + gene_id + "?type=cds;multiple_sequences=1"
content_type = "text/x-fasta"

response, content = http.request(server + query, method="GET", headers={"Content-Type":content_type})
check_response(response)

transcripts = re.findall(">(.*)", content)
f = open("output.fasta", "wa")

for transcript in transcripts:
    query = "sequence/id/" + transcript
    response, content = http.request(server + query, method="GET", headers={"Content-Type":content_type})
    check_response(response)
    f.write(content)

f.close()

Thanks in advance

exon ensembl intron • 6.2k views
ADD COMMENT
5
Entering edit mode
9.6 years ago

Hi Gungor,

We have taken your suggestion into account and added an option to softmask intronic regions

This option is available on our new rest server, http://rest.ensembl.org, along with improved performance.

The following: http://rest.ensembl.org/sequence/id/ENSG00000157764?content-type=text/plain;mask_feature=1

will return the whole gene sequence, with intron sequences in lower case.

In your example, http://rest.ensembl.org/sequence/id/ENSG00000197568?content-type=text/plain;type=cds;multiple_sequences=1

you are already retrieving only the coding sequence for each transcript in the gene. Hence, there are no intronic regions.

I hope this helps and please do not hesitate to contact us if you have any further enquiries.

Regards,
Magali

ADD COMMENT
1
Entering edit mode
10.4 years ago
Emily 23k

Hi Gungor

I'm afraid we don't have a straightforward option of downloading a gene sequence in that format using the REST API at present. The service is still in its beta phase, so is not yet at its full capability. We're trying to prioritise functionality that we know users are interested in, so we will take your feedback into account when deciding which endpoints we want to add next.

You can get the exons using the sequence/id method.

If you're a perl programmer, this data is very easy to get via the Perl API, which I can help you with if needed.

Emily

ADD COMMENT
0
Entering edit mode

REST API is really cool, I can't wait to see it fully functional. I'll try Perl API. Actually, I installed it but I got lost in Perl classes and data types. And it seemed a bit slow. But if it's the only option, I will look at it again and tell you if I have questions. Thanks Emily.

ADD REPLY
1
Entering edit mode

Have you seen our new online course? There are various scripts in there that you can cannibalise to make life easier.

ADD REPLY
0
Entering edit mode

Yes, I have. And started watching tuts and doing exercises. It'll definitely help. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2289 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6