I'm doing a college project where I need to call variants on some human cancer samples. I've chosen to align the reads to the GRCh38.p10 assembly, but now I'm having a hard time finding the appropriate 1000 Genomes Indels VCF files to run BaseRecalibrator and downstream commands.
The latest GATK bundle seems to have vcfs for hg38, but I presume all the chromosome names are in the UCSC format, and not compatible with my GRCh38 assembly.
My question is can I use the Mills gold standard indels for build GRCh37, or does somebody know where I can find the latest ones for the GRCh38 assembly?
Also, the study that my samples are derived from used the Agilent SureSelect Human All Exon v4 to do exome sequencing, and I read on the GATK website that I should use the -L tag with a custom bed file when running BaseRecalibrator for exome sequencing data. Does anyone have the corresponding bed file? I tried going to Agilent's eArray site but it appears to be down for me.
If not, I used Pierre Lindenbaum's command in this post to generate my own bed file, but it didn't work for me since it was based on hg38 and all the chromosomes are named incorrectly. I can fix that with a simple text replacement script, but I was wondering if the hg38 exome coordinates will be the same as the ones for the GRCh38 assembly.
Thanks for your help.