Question: NA12878 High Confidence Callset Sequence Dictionaries
7 weeks ago
I want to compare a WGS vcf callset I have produced using the GATK best practices with the NA12878 WGS gold-standard/high confidence vcf callset but an error related to Differing Sequence Dictionary sizes is preventing me form performing any concordance analysis.

I have downloaded the NA12878 "High Confidence Callset" ( (I have tried this specific release as well and the one under "latest") and when I try to compare this .vcf to a .vcf I have produced I get an error that the dictionary sizes are different (code below).

From what I've gathered so far this error likely arises from alignment with different reference genomes. I first got this error when aligning & calling myself when I was using the human_g1k_v37 reference genome (I have been unable to find the reference genome these gold-standard vcf files were developed under). I also then downloaded the RMNISTHS_30xdownsample.bam file ( The Readme mentions it was aligned with BWA MEM but not which reference genome, but I got the same error as before. I assumed that the RMNISTHS_30xdownsample.bam file from the NCBI FTP was aligned with the same reference genome as the vcf from the same FTP, but I still get the error.

The GenotypeConcordance code that produces the error is as follows:

/path/to/gatk GenotypeConcordance -CV=/path/to/myinput.vcf.gz -O=/path/to/output.vcf -TV=/path/to/NIST_RTG_PlatGen_merged_highconfidence_v0.2_Allannotate.vcf.gz`

The error:

htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence Dictionaries are not the same size (25, 181)
at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual( htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(
        at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(
        at picard.vcf.GenotypeConcordance.doWork(
        at picard.cmdline.CommandLineProgram.instanceMain(
        at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(
        at org.broadinstitute.hellbender.Main.mainEntry(
        at org.broadinstitute.hellbender.Main.main(

If anyone has knows whether there is a tool (within GATK or another) that disregards the differing sequence dictionary lengths (An earlier version of GATK had an option for this but I cant find this option in GATK4) that would be awesome.

Thanks in advance for any ideas/help/advice

written 7 weeks ago by ilee660
7 weeks ago
France/Nantes/Institut du Thorax - INSERM UMR1087
see changing of chromosome notation in CHROM columns of vcf file

written 7 weeks ago by Pierre Lindenbaum

Thank you Pierre. Exactly what I was looking for.

written 7 weeks ago by ilee660
