5 weeks ago
gernophil ▴ 80

Hey everyone,

I am doing a lot of variant calling. So far, I have always used the Ensembl refgenomes with the "number only" nomenclature for the main chromosomes. My default workflow (very simplified) is: Map fastq to ensembl refgenome -> call variants -> annotate variants with VEP.

I prefer to use the VEP cache over a gtf/gff files for annotation since this is recommended by Ensembl. Now I have the problem that I need to use a premade panel of normal for calling somatic variants with Mutect2. I know it would be best, if I had a matching PoN for my samples, but I don't have that. So I am using the PoN from the 1000Genomes project as recommended by GATK (1000g_pon.hg38.vcf.gz) and the GNOMAD germline reference (af-only-gnomad.hg38.vcf). The problem is that AFAIK these files only exist with UCSC chromosome nomenclature, but not for Ensembl. I know there are ways to rename these files, but since they have so many non-standard contigs, I have the feeling that might get a little messy.

So, my current workaround is to trim the PoN and GNOMAD vcfs to the Standard chromosomes, since my BAM files are also trimmed to the CDS of some genes that are all on the main chromosomes. Renaming these is straightforward. However, I don't know, if Mutect2 also uses the information from other parts of the genome then in the BAM file. Could someone comment on that?

Still, I am looking for a real clean way to deal with this, which would be either (i) mapping to an UCSC refgenome and find a tool to annotate the UCSC named VCFs analog to the VEP or (ii) find an "official" 1000g PoN and germline reference for Ensembl named refgenomes.

