I am running a variant calling pipeline for cancer samples. It includes Mutect2.
Working on human, i started with the reference & dbsnp files contained in the GATK bundle for hg38 (ftp://ftp.broadinstitute.org/bundle/hg38). Picked the following files :
Homo_sapiens_assembly38.fasta.gz dbsnp_146.hg38.vcf.gz dbsnp_146.hg38.vcf.gz.tbi
With Mutect2, you can feed a DB of known somatic variants using "--cosmic". Given that i started the pipeline with hg38 reference file, i picked the grch38 cosmic file (https://cancer.sanger.ac.uk/cosmic/files?data=/files/grch38/cosmic/v79/VCF/CosmicCodingMuts.vcf.gz). From my understanding, hg38 <=> UCSC and GRCh38 <=> NCBI, but i thought it would be close/good enough.
Then, when i run Mutect2, i get the following error : "Input files cosmic and reference have incompatible contigs. Error details: The contig order in cosmic and reference is not the same"
I corrected chromosomes names (1->chr1, MT->chrM, etc...) in the CosmicCodingMuts.vcf file, then sorted it using Picard SortVcf. But i am still stuck with the same kind of error in Mutect2.
Question is : 1) How to modify the COSMIC.vcf to match hg38 reference ? 2) If 1) is not possible, where can retrieve compatible genome_ref + germline_snp + somatic_snp ?