Question: pythonic way to get gene_id, gene_symbol from given list of gene, transcript co-ordinates
0
gravatar for badribio
2.5 years ago by
badribio240
badribio240 wrote:

Any quick methods to get ensemble gene_id, gene_symbol for a list of co-ordinates (transcript), this post A: Identify gene symbols given a list of chromosome positions talks about ucsc only.

python • 696 views
ADD COMMENTlink written 2.5 years ago by badribio240
2

Hello,

you could use ensembl's REST-Api for this, e.g. the Overlap endpoint.

fin swimmer

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by finswimmer12k

any snippet to query rest api using python?

ADD REPLYlink written 2.5 years ago by badribio240
1
gravatar for Devon Ryan
2.5 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

I'd be surprised if this couldn't be done with biomart.

In fact, here is a very simple example using GRCh38.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Devon Ryan92k

biomart has a limit of 500 queries, (I may be wrong here) I have a lot of lines as this is transcript level co-ordinates

ADD REPLYlink written 2.5 years ago by badribio240
1

BTW, if for some reason you really want a python-based solution then download a GTF file and:

pip install deeptools

then in python

from deeptoolsintervals import GTF
anno = GTF("foo.gtf", transcriptID="gene_id", transcript_id_designator="gene")
anno.findOverlaps("chr1", 1, 1000)

That will get you the gene_id field and coordinate information. The python wrapper doesn't allow access to the symbol, so you'd need to just download the mapping from biomart.

If you don't want to perform a bunch of remote queries then something along those lines would work. I never really intended for others to use that python module, but if you ever want to it's documented here.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Devon Ryan92k

Thanks I will try this out, I need python solution as I need to modify a pipeline which has been written using python. else bedtools was my first choice, having said that pybedtools should also do the job I am not wrong.

ADD REPLYlink written 2.5 years ago by badribio240

Yeah, pybedtools would have been my other suggestion.

ADD REPLYlink written 2.5 years ago by Devon Ryan92k

At that point don't you have the transcript IDs? Then you don't need to look anything up with coordinates, you just need to convert the transcript to gene ID (also available on biomart).

ADD REPLYlink written 2.5 years ago by Devon Ryan92k

Nope, I have the co ordinates just like output from tophat junctions.bed file.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by badribio240
1

I expect that bedtools intersect and a bit of awk will turn out to be the simplest solution :P

ADD REPLYlink written 2.5 years ago by Devon Ryan92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1138 users visited in the last hour