Question: Obtaining hg38 gene features: promoters, 5'UTR, exons, introns, 3'UTR
gravatar for Sergio Martínez Cuesta
20 months ago by
Cambridge, UK

Which approach do you find useful to extract gene features (promoters, 5'UTR, exons, introns, 3'UTR ) from the annotation file (genes.gtf) of a reference genome e.g. the iGenomes UCSC hg38?

I often use the functions available in the GenomicFeatures bioconductor package, e.g. makeTxDbFromGFF, promoters, genes, transcripts, ...

txdb <- makeTxDbFromGFF("genes.gtf", format="gtf")
promoters(txdb) # extracting promoters
exons(txdb) # extracting exons

However I was wondering which other strategies are commonly used in general when genes features are needed. Any ideas would be helpful.

genomic features hg38 gene • 1.3k views
ADD COMMENTlink modified 20 months ago by ATpoint29k • written 20 months ago by Sergio Martínez Cuesta60

Check this out.

ADD REPLYlink written 20 months ago by ATpoint29k
gravatar for Alex Reynolds
20 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Use grep or awk, e.g.:

$ awk '$3=="exon"' genes.gtf > exons.gtf

You can use BEDOPS gtf2bed to convert a GTF file to BED, and then use BEDOPS bedops and bedmap tools to calculate subsets of that BED file.

This approach can be used, for example, to get exon-intron junctions, intergenic regions, annotations that overlap SNPs with disease phenotypes of interest in promoter regions, etc. etc.

ADD COMMENTlink modified 20 months ago • written 20 months ago by Alex Reynolds29k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1820 users visited in the last hour