Question: Obtaining hg38 gene features: promoters, 5'UTR, exons, introns, 3'UTR
gravatar for Sergio Martínez Cuesta
13 months ago by
Cambridge, UK

Which approach do you find useful to extract gene features (promoters, 5'UTR, exons, introns, 3'UTR ) from the annotation file (genes.gtf) of a reference genome e.g. the iGenomes UCSC hg38?

I often use the functions available in the GenomicFeatures bioconductor package, e.g. makeTxDbFromGFF, promoters, genes, transcripts, ...

txdb <- makeTxDbFromGFF("genes.gtf", format="gtf")
promoters(txdb) # extracting promoters
exons(txdb) # extracting exons

However I was wondering which other strategies are commonly used in general when genes features are needed. Any ideas would be helpful.

genomic features hg38 gene • 883 views
ADD COMMENTlink modified 13 months ago by ATpoint19k • written 13 months ago by Sergio Martínez Cuesta60

Check this out.

ADD REPLYlink written 13 months ago by ATpoint19k
gravatar for Alex Reynolds
13 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Use grep or awk, e.g.:

$ awk '$3=="exon"' genes.gtf > exons.gtf

You can use BEDOPS gtf2bed to convert a GTF file to BED, and then use BEDOPS bedops and bedmap tools to calculate subsets of that BED file.

This approach can be used, for example, to get exon-intron junctions, intergenic regions, annotations that overlap SNPs with disease phenotypes of interest in promoter regions, etc. etc.

ADD COMMENTlink modified 13 months ago • written 13 months ago by Alex Reynolds28k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 568 users visited in the last hour