Coordinates for genomic features?
1
0
Entering edit mode
8 months ago
Ankit ▴ 180

Hi Everyone,

Can anyone help me how to get coordinates of the genome features for hg19? For example genes, Exon, intron, 5'utr and 3'utr , promoters.

For genes and Exon I can get from gtf file. Right? The issue is with other features.

Thank you

gene utr promoter intron exon • 452 views
0
Entering edit mode

Hi

thanks

Good suggestion.

How about UCSC genome browser table https://genome.ucsc.edu/cgi-bin/hgTables

Do you think is it correctly provide desired coordinates?

I stiil dont know about promoter. So I thought to take 200 bp upstream of gene start. Does it make sense for approx promoter site ?

1
Entering edit mode

This should be a comment to my original answer and that keeps the organization properly.

I am not entirely sure about the UCSC browser table, but looks like you might be able to retrieve some data that you want.

Sorry, I missed the promoter part of your original question. Ensemble has great resources that can help you get at this. Take a look at this Biostars post and this documentation. Based on these you can download human regulatory build from the FTP site.

If you run this command on this file you can see that it contains more information and this seems like a great resource.

awk '{print $3}' homo_sapiens.GRCh38.Regulatory_Build.regulatory_features.20190329.gff | sort | uniq CTCF_binding_site TF_binding_site enhancer open_chromatin_region promoter promoter_flanking_region  Hope this helps! ADD REPLY 2 Entering edit mode 8 months ago jkkbuddika ▴ 160 You can get all coordinates using a GTF file downloaded from Ensembl. Download the GTF file and then run: awk '{print$3}' Homo_sapiens.GRCh38.101.gtf | sort | uniq

CDS
exon
five_prime_utr
gene
Selenocysteine
start_codon
stop_codon
three_prime_utr
transcript


So if you want to get information about 5'-UTRs, you can run this:

awk '\$3 == "five_prime_utr"' Homo_sapiens.GRCh38.101.gtf > 5utr.bed


This should create a bed file that contains 5'-UTR details. Take a look at this Biostars post to find a nice description about how to obtain intronic/intergenic coordinates. Hope this helps.