Question: NA12878 High Confidence Callset Sequence Dictionaries
gravatar for ilee66
12 months ago by
ilee660 wrote:

I want to compare a WGS vcf callset I have produced using the GATK best practices with the NA12878 WGS gold-standard/high confidence vcf callset but an error related to Differing Sequence Dictionary sizes is preventing me form performing any concordance analysis.

I have downloaded the NA12878 "High Confidence Callset" ( (I have tried this specific release as well and the one under "latest") and when I try to compare this .vcf to a .vcf I have produced I get an error that the dictionary sizes are different (code below).

From what I've gathered so far this error likely arises from alignment with different reference genomes. I first got this error when aligning & calling myself when I was using the human_g1k_v37 reference genome (I have been unable to find the reference genome these gold-standard vcf files were developed under). I also then downloaded the RMNISTHS_30xdownsample.bam file ( The Readme mentions it was aligned with BWA MEM but not which reference genome, but I got the same error as before. I assumed that the RMNISTHS_30xdownsample.bam file from the NCBI FTP was aligned with the same reference genome as the vcf from the same FTP, but I still get the error.

The GenotypeConcordance code that produces the error is as follows:

/path/to/gatk GenotypeConcordance -CV=/path/to/myinput.vcf.gz -O=/path/to/output.vcf -TV=/path/to/NIST_RTG_PlatGen_merged_highconfidence_v0.2_Allannotate.vcf.gz`

The error:

htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence Dictionaries are not the same size (25, 181)
at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual( htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(
        at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(
        at picard.vcf.GenotypeConcordance.doWork(
        at picard.cmdline.CommandLineProgram.instanceMain(
        at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(
        at org.broadinstitute.hellbender.Main.mainEntry(
        at org.broadinstitute.hellbender.Main.main(

If anyone has knows whether there is a tool (within GATK or another) that disregards the differing sequence dictionary lengths (An earlier version of GATK had an option for this but I cant find this option in GATK4) that would be awesome.

Thanks in advance for any ideas/help/advice

gatk tool alignment vcf genome • 423 views
ADD COMMENTlink modified 12 months ago by ATpoint36k • written 12 months ago by ilee660
gravatar for Pierre Lindenbaum
12 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

see changing of chromosome notation in CHROM columns of vcf file

ADD COMMENTlink written 12 months ago by Pierre Lindenbaum129k

Thank you Pierre. Exactly what I was looking for.

ADD REPLYlink written 12 months ago by ilee660
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1377 users visited in the last hour