Question: How To Get The 3' Utr Given Geneid And Transcriptid In Ensembl Using Python?
3
gravatar for Sam
8.4 years ago by
Sam70
Sam70 wrote:

Hello Everyone,

I have to do data collection for my current project where i have to collect all the 3' UTR sequences given Ensembl GeneID and Ensembl Transcript ID. I am new to this field and was wondering whether there is an easier way to do this than manually getting each 3' UTR. I have a list of GeneID and Transcript ID.

I did explore Ensembl BioMart, however i could not figure out how to exactly input both GeneID and transcriptID.

Also, would there be a way to incorporate this into python? I do know SQL at Beginner-Intermediate level.

Thank you very much in advance.

ensembl biomart utr api • 6.7k views
ADD COMMENTlink modified 3.5 years ago by alex.rubinsteyn130 • written 8.4 years ago by Sam70
2
gravatar for alex.rubinsteyn
3.5 years ago by
United States
alex.rubinsteyn130 wrote:

A possibly more convenient solution: https://github.com/hammerlab/pyensembl

ADD COMMENTlink written 3.5 years ago by alex.rubinsteyn130
1
gravatar for Darked89
8.4 years ago by
Darked894.2k
Barcelona, Spain
Darked894.2k wrote:

Biomart: http://www.ensembl.org/biomart/martview

Database: Ensembl Genes 61 Dataset: Homo sapiens genes

Filters: check box on ID list limit There is a text area where you can paste Ensembl gene ids, i.e.: ENSG00000139618

Attributes: Sequences

Then below select: 3' UTR

Click count (top left of the page) just for checking, then go for Results.

ADD COMMENTlink written 8.4 years ago by Darked894.2k
1
gravatar for Michael Dondrup
8.4 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

You do not need to input both GeneID and transcriptID at the same time, the transcript id should be sufficient and it is unique. It is true that a gene can have multiple transcripts, but a transcript should only be assigned to one gene, so with a transcript id, gene ids are redundant.

Biomart has some built in tools to automatize querys, here is an exported URL to a Biomart query, such as the one described by darked, but with transcript ids as a filter instead of Ensembl gene ids. If you follow the link, you have a query you can start with. You can also get this query as XML or directly as a perl script. If you really must use python, there is more documentation on how to use the REST or SOAP interface.

ADD COMMENTlink written 8.4 years ago by Michael Dondrup46k
0
gravatar for Sam
8.4 years ago by
Sam70
Sam70 wrote:

Darked89 and Michael Dondrup: thank you for answering my question

Michael: I tried it the way u suggested. However i only get back 1 result after inputting all my transcript ids. example link with multiple ids a bit shorter

Here is the [file with all the TransID][2]

Thank you for your time.

[2]: http://www.filedropper.com/transid "file with all the transID

ADD COMMENTlink modified 8.4 years ago by Michael Dondrup46k • written 8.4 years ago by Sam70
1

Argh! I realise your reply was too large to paste in as a comment, but please do not add an answer if you're trying to address comments. If you want to paste a large amount of text into a comment on someone else's reply, please use something like http://pastebin.com/ and point a link to the output you would like us to look at!

ADD REPLYlink written 8.4 years ago by Daniel Swan13k

No, your link works fine, the result is a multi-fasta file, scroll down ;)

ADD REPLYlink written 8.4 years ago by Michael Dondrup46k

He tried to do it right, but the very long URL broke the formatting. I put a shorter URL.

ADD REPLYlink written 8.4 years ago by Michael Dondrup46k

Daniel Swan: I'm sorry about making a new comment. I am new to this website and i did not want to paste the huge link to the comment section. Therefore i thought i would make a new comment. I will keep this in mind in future. Thank you

ADD REPLYlink written 8.4 years ago by Sam70

Michael: Thank you very very much. I realized after looking at your shortened link that i was not putting commas after every transcript ID. Therefore it was not querying all the ID's. You put in 5 ID's separated by commas and therefore it queried and returned all of them. I put all my ID's in that format and i got the answer! Thank you very much for all your time and effort. :)

ADD REPLYlink written 8.4 years ago by Sam70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 729 users visited in the last hour