Question: Obtaining hg38 gene features: promoters, 5'UTR, exons, introns, 3'UTR
gravatar for Sergio Martínez Cuesta
2.2 years ago by
Cambridge, UK

Which approach do you find useful to extract gene features (promoters, 5'UTR, exons, introns, 3'UTR ) from the annotation file (genes.gtf) of a reference genome e.g. the iGenomes UCSC hg38?

I often use the functions available in the GenomicFeatures bioconductor package, e.g. makeTxDbFromGFF, promoters, genes, transcripts, ...

txdb <- makeTxDbFromGFF("genes.gtf", format="gtf")
promoters(txdb) # extracting promoters
exons(txdb) # extracting exons

However I was wondering which other strategies are commonly used in general when genes features are needed. Any ideas would be helpful.

genomic features hg38 gene • 1.7k views
ADD COMMENTlink modified 2.2 years ago by ATpoint36k • written 2.2 years ago by Sergio Martínez Cuesta90

Check this out.

ADD REPLYlink written 2.2 years ago by ATpoint36k
gravatar for Alex Reynolds
2.2 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Use grep or awk, e.g.:

$ awk '$3=="exon"' genes.gtf > exons.gtf

You can use BEDOPS gtf2bed to convert a GTF file to BED, and then use BEDOPS bedops and bedmap tools to calculate subsets of that BED file.

This approach can be used, for example, to get exon-intron junctions, intergenic regions, annotations that overlap SNPs with disease phenotypes of interest in promoter regions, etc. etc.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Alex Reynolds30k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1081 users visited in the last hour