I am rather new to bioinformatics (PhD student) and am stuck on a particular part of the pipeline my advisor and I are setting up. I am essentially trying to put together a genomic feature graph that includes promoter, TSS, intron, exon, intergenic, 5'UTR, 3'UTR, and CDS. We want the final output to be a stacked bargraph with percentages. I was able to find a gtf annotations from gencode which has gene, exon, transcript, start_codon, stop_codon, 5'UTR, 3'UTR, and CDS. From the looks of it, it seems I have everything except promoter, intron, and intergenic and would eventually need to remove gene and transcript since they overlap several features. Also, do I also need to account for the overlap of CDS with exon in the gtf file. How would I go about this as far as scripting goes?
I would really like to use this with featureCounts as it is a rather straight forward program.