Question: extracting the exons coordinates on hg38
gravatar for Bogdan
2.5 years ago by
Palo Alto, CA, USA
Bogdan970 wrote:

Dear all,

please could you advise : how can we obtain the coordinates of exons of the RefSeq or UCSC genes (canonical isoforms) on hg38, where each coordinates (chr, start, end) also have assigned the gene name .. ?

thanks a lot, and a happy weekend,

-- bogdan

exome genome • 2.1k views
ADD COMMENTlink modified 2.5 years ago by chen1.9k • written 2.5 years ago by Bogdan970

Hello Bogdan!

It appears that your post has been cross-posted to another site:

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 2.5 years ago by WouterDeCoster43k

Dear gentlemen, thank you for your replies : very much appreciate your help ;). Yes, i knew the previous postings related to extracting the hg19 exon coordinates before i emailed ; although it applies a bit differently to hg38 and to RefSeq genes.

I thought that we may find 2 solutions to the same question : in BioC (by using GenomeFeatures), and in a not-BioC related manner; that I can compare afterwards.

thanks again for your hep, and happy weekend ;) !

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Bogdan970

Hi Bogdan

I don't know whether you completed this in the end, but ffor anyone else who is trying to use Hg38 then I would recommend following the blog post linked above for hg19, but use the GENCODEv29 track instead of UCSC genes to download the canonical transcripts and exons file.

Follow the instructions as written but just change the tracks over. The code given in that previous blog should work to. I had no errors.

Hope that helps


ADD REPLYlink written 11 months ago by s16671530
gravatar for chen
2.5 years ago by
chen1.9k wrote:

Try OpenGene.jl, a library written in Julia (

using OpenGene, OpenGene.Reference

# load the gencode dataset, it will download a file from gencode website if it's not downloaded before
# once it's loaded, it will be cached so future loads will be fast
index = gencode_load("GRCh38")

genes = gencode_genes(index, "TP53")
tp53 = genes[1]
exons = tp53.transcripts[1].exons
#print the exons
for exon in exons:
    println(exon.number, exon.start_pos, exon.end_pos)
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by chen1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1877 users visited in the last hour