Question

extracting the exons coordinates on hg38

0

Entering edit mode

6.6 years ago

Bogdan ★ 1.4k

Dear all,

please could you advise : how can we obtain the coordinates of exons of the RefSeq or UCSC genes (canonical isoforms) on hg38, where each coordinates (chr, start, end) also have assigned the gene name .. ?

thanks a lot, and a happy weekend,

-- bogdan

genome exome • 5.2k views

ADD COMMENT • link updated 6.6 years ago by chen ★ 2.5k • written 6.6 years ago by Bogdan ★ 1.4k

0

Entering edit mode

see Exon Coordinates Of Hg19 Genome Download and How To Get Bed File Containing Exons Of Canonical Transcripts And Their Corresponding Gene Symbols

ADD REPLY • link 6.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Hello Bogdan!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/101333/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 6.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Dear gentlemen, thank you for your replies : very much appreciate your help ;). Yes, i knew the previous postings related to extracting the hg19 exon coordinates before i emailed ; although it applies a bit differently to hg38 and to RefSeq genes.

I thought that we may find 2 solutions to the same question : in BioC (by using GenomeFeatures), and in a not-BioC related manner; that I can compare afterwards.

thanks again for your hep, and happy weekend ;) !

ADD REPLY • link 6.6 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Hi Bogdan

I don't know whether you completed this in the end, but ffor anyone else who is trying to use Hg38 then I would recommend following the blog post linked above for hg19, but use the GENCODEv29 track instead of UCSC genes to download the canonical transcripts and exons file.

Follow the instructions as written but just change the tracks over. The code given in that previous blog should work to. I had no errors.

Hope that helps

Lloyd

ADD REPLY • link 5.0 years ago by s1667153 • 0

score 0 · Answer 1 · 2017-10-08

Try OpenGene.jl, a library written in Julia (https://github.com/OpenGene/OpenGene.jl).

using OpenGene, OpenGene.Reference

# load the gencode dataset, it will download a file from gencode website if it's not downloaded before
# once it's loaded, it will be cached so future loads will be fast
index = gencode_load("GRCh38")

genes = gencode_genes(index, "TP53")
tp53 = genes[1]
exons = tp53.transcripts[1].exons
#print the exons
for exon in exons:
    println(exon.number, exon.start_pos, exon.end_pos)
end