Question: Obtaining hg38 gene features: promoters, 5'UTR, exons, introns, 3'UTR
gravatar for Sergio Martínez Cuesta
2.5 years ago by
Cambridge, UK
Sergio Martínez Cuesta170 wrote:

Which approach do you find useful to extract gene features (promoters, 5'UTR, exons, introns, 3'UTR ) from the annotation file (genes.gtf) of a reference genome e.g. the iGenomes UCSC hg38?

I often use the functions available in the GenomicFeatures bioconductor package, e.g. makeTxDbFromGFF, promoters, genes, transcripts, ...

txdb <- makeTxDbFromGFF("genes.gtf", format="gtf")
promoters(txdb) # extracting promoters
exons(txdb) # extracting exons

However I was wondering which other strategies are commonly used in general when genes features are needed. Any ideas would be helpful.

genomic features hg38 gene • 1.9k views
ADD COMMENTlink modified 2.5 years ago by ATpoint41k • written 2.5 years ago by Sergio Martínez Cuesta170

Check this out.

ADD REPLYlink written 2.5 years ago by ATpoint41k
gravatar for Alex Reynolds
2.5 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

Use grep or awk, e.g.:

$ awk '$3=="exon"' genes.gtf > exons.gtf

You can use BEDOPS gtf2bed to convert a GTF file to BED, and then use BEDOPS bedops and bedmap tools to calculate subsets of that BED file.

This approach can be used, for example, to get exon-intron junctions, intergenic regions, annotations that overlap SNPs with disease phenotypes of interest in promoter regions, etc. etc.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Alex Reynolds31k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1439 users visited in the last hour