I am trying to calculate enrichment of Structural variant breakpoints and SNV locations in genomic features (exon, intron, 5'UTR...) in a non-human genome. To know whether a feature is over/underrepresented in a SV/SNV dataset, I first need to know the fraction of the genome that is feature X. For example the total length of all exons across the genome.
Is there an existing resource that has this sort of information (for annotated genomes like Drosophila)? If not, is there an R package that can calculate this?
Secondly, it's not clear to me how this is calculated in the first place. If a gene has 10 transcript variants, how do we calculate that gene's total exotic region? Total exon length / 10? Add up all the exons from the longest transcript variant? Take the longest possible transcript length (longest exon1 + longest exon2)? I'd love to know how this is usually calculated.
I've had a look at GenomicFeatures, but it's not clear to me how I would go about doing this using this package. Any suggestions/advice would be very welcome.