Question: Obtaining hg38 gene features: promoters, 5'UTR, exons, introns, 3'UTR
2
gravatar for Sergio Martínez Cuesta
8 months ago by
Cambridge, UK

Which approach do you find useful to extract gene features (promoters, 5'UTR, exons, introns, 3'UTR ) from the annotation file (genes.gtf) of a reference genome e.g. the iGenomes UCSC hg38?

I often use the functions available in the GenomicFeatures bioconductor package, e.g. makeTxDbFromGFF, promoters, genes, transcripts, ...

library(GenomicFeatures)
txdb <- makeTxDbFromGFF("genes.gtf", format="gtf")
promoters(txdb) # extracting promoters
exons(txdb) # extracting exons

However I was wondering which other strategies are commonly used in general when genes features are needed. Any ideas would be helpful.

genomic features hg38 gene • 574 views
ADD COMMENTlink modified 8 months ago by ATpoint13k • written 8 months ago by Sergio Martínez Cuesta60
1

Check this out.

ADD REPLYlink written 8 months ago by ATpoint13k
2
gravatar for Alex Reynolds
8 months ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Use grep or awk, e.g.:

$ awk '$3=="exon"' genes.gtf > exons.gtf

You can use BEDOPS gtf2bed to convert a GTF file to BED, and then use BEDOPS bedops and bedmap tools to calculate subsets of that BED file.

This approach can be used, for example, to get exon-intron junctions, intergenic regions, annotations that overlap SNPs with disease phenotypes of interest in promoter regions, etc. etc.

ADD COMMENTlink modified 8 months ago • written 8 months ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2471 users visited in the last hour