Question

pythonic way to get gene_id, gene_symbol from given list of gene, transcript co-ordinates

0

Entering edit mode

8.2 years ago

badribio ▴ 290

Any quick methods to get ensemble gene_id, gene_symbol for a list of co-ordinates (transcript), this post A: Identify gene symbols given a list of chromosome positions talks about ucsc only.

python • 2.1k views

ADD COMMENT • link 8.2 years ago by badribio ▴ 290

2

Entering edit mode

Hello,

you could use ensembl's REST-Api for this, e.g. the Overlap endpoint.

fin swimmer

ADD REPLY • link 8.2 years ago by finswimmer 16k

0

Entering edit mode

any snippet to query rest api using python?

ADD REPLY • link 8.2 years ago by badribio ▴ 290

score 1 · Answer 1 · 2017-04-20

1

Entering edit mode

8.2 years ago

Devon Ryan 105k

I'd be surprised if this couldn't be done with biomart.

In fact, here is a very simple example using GRCh38.

ADD COMMENT • link 8.2 years ago by Devon Ryan 105k

0

Entering edit mode

biomart has a limit of 500 queries, (I may be wrong here) I have a lot of lines as this is transcript level co-ordinates

ADD REPLY • link 8.2 years ago by badribio ▴ 290

1

Entering edit mode

BTW, if for some reason you really want a python-based solution then download a GTF file and:

pip install deeptools

then in python

from deeptoolsintervals import GTF
anno = GTF("foo.gtf", transcriptID="gene_id", transcript_id_designator="gene")
anno.findOverlaps("chr1", 1, 1000)

That will get you the gene_id field and coordinate information. The python wrapper doesn't allow access to the symbol, so you'd need to just download the mapping from biomart.

If you don't want to perform a bunch of remote queries then something along those lines would work. I never really intended for others to use that python module, but if you ever want to it's documented here.

ADD REPLY • link 8.2 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks I will try this out, I need python solution as I need to modify a pipeline which has been written using python. else bedtools was my first choice, having said that pybedtools should also do the job I am not wrong.

ADD REPLY • link 8.2 years ago by badribio ▴ 290

0

Entering edit mode

Yeah, pybedtools would have been my other suggestion.

ADD REPLY • link 8.2 years ago by Devon Ryan 105k

0

Entering edit mode

At that point don't you have the transcript IDs? Then you don't need to look anything up with coordinates, you just need to convert the transcript to gene ID (also available on biomart).

ADD REPLY • link 8.2 years ago by Devon Ryan 105k

0

Entering edit mode

Nope, I have the co ordinates just like output from tophat junctions.bed file.

ADD REPLY • link 8.2 years ago by badribio ▴ 290

1

Entering edit mode

I expect that bedtools intersect and a bit of awk will turn out to be the simplest solution :P

ADD REPLY • link 8.2 years ago by Devon Ryan 105k