I downloaded the NCBI Refseq curated file of Genes and Gene Predictions from the UCSC Table Browser for hg38 as I want to use the exon coordinates as a target file for calling variants on Exome Sequencing data.
I noticed however, that the exon coordinates cover approximately double the genomic region as the exon coordinates in hg19 did (~80 million bps vs ~40 million). Is it possible that the size of the exome is really double in hg38?
I do not want to call variants on all of these regions since ~30% of these exonic regions are not covered at all in my WES data and another ~10% is covered by <10x. I would definitely like to exclude these regions from the target file but I do not fully understand what these regions are/why they were included in the first place.
Any help would be greatly appreciated.