extracting the exons coordinates on hg38
1
0
Entering edit mode
3.5 years ago
Bogdan ★ 1.1k

Dear all,

please could you advise : how can we obtain the coordinates of exons of the RefSeq or UCSC genes (canonical isoforms) on hg38, where each coordinates (chr, start, end) also have assigned the gene name .. ?

thanks a lot, and a happy weekend,

-- bogdan

genome exome • 3.1k views
0
Entering edit mode
0
Entering edit mode

Hello Bogdan!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/101333/

This is typically not recommended as it runs the risk of annoying people in both communities.

0
Entering edit mode

Dear gentlemen, thank you for your replies : very much appreciate your help ;). Yes, i knew the previous postings related to extracting the hg19 exon coordinates before i emailed ; although it applies a bit differently to hg38 and to RefSeq genes.

I thought that we may find 2 solutions to the same question : in BioC (by using GenomeFeatures), and in a not-BioC related manner; that I can compare afterwards.

thanks again for your hep, and happy weekend ;) !

0
Entering edit mode

Hi Bogdan

I don't know whether you completed this in the end, but ffor anyone else who is trying to use Hg38 then I would recommend following the blog post linked above for hg19, but use the GENCODEv29 track instead of UCSC genes to download the canonical transcripts and exons file.

Follow the instructions as written but just change the tracks over. The code given in that previous blog should work to. I had no errors.

Hope that helps

Lloyd

0
Entering edit mode
3.5 years ago
chen ★ 2.1k

Try OpenGene.jl, a library written in Julia (https://github.com/OpenGene/OpenGene.jl).

using OpenGene, OpenGene.Reference

# once it's loaded, it will be cached so future loads will be fast

genes = gencode_genes(index, "TP53")
tp53 = genes[1]
exons = tp53.transcripts[1].exons
#print the exons
for exon in exons:
println(exon.number, exon.start_pos, exon.end_pos)
end