Question: NA12878 High Confidence Callset Sequence Dictionaries
0
gravatar for ilee66
7 weeks ago by
ilee660
ilee660 wrote:

I want to compare a WGS vcf callset I have produced using the GATK best practices with the NA12878 WGS gold-standard/high confidence vcf callset but an error related to Differing Sequence Dictionary sizes is preventing me form performing any concordance analysis.

I have downloaded the NA12878 "High Confidence Callset" (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/GIABPedigreev0.2/) (I have tried this specific release as well and the one under "latest") and when I try to compare this .vcf to a .vcf I have produced I get an error that the dictionary sizes are different (code below).

From what I've gathered so far this error likely arises from alignment with different reference genomes. I first got this error when aligning & calling myself when I was using the human_g1k_v37 reference genome (I have been unable to find the reference genome these gold-standard vcf files were developed under). I also then downloaded the RMNISTHS_30xdownsample.bam file (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/) The Readme mentions it was aligned with BWA MEM but not which reference genome, but I got the same error as before. I assumed that the RMNISTHS_30xdownsample.bam file from the NCBI FTP was aligned with the same reference genome as the vcf from the same FTP, but I still get the error.

The GenotypeConcordance code that produces the error is as follows:

/path/to/gatk GenotypeConcordance -CV=/path/to/myinput.vcf.gz -O=/path/to/output.vcf -TV=/path/to/NIST_RTG_PlatGen_merged_highconfidence_v0.2_Allannotate.vcf.gz`

The error:

htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence Dictionaries are not the same size (25, 181)
at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:250)at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:333)
        at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:319)
        at picard.vcf.GenotypeConcordance.doWork(GenotypeConcordance.java:350)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
        at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)

If anyone has knows whether there is a tool (within GATK or another) that disregards the differing sequence dictionary lengths (An earlier version of GATK had an option for this but I cant find this option in GATK4) that would be awesome.

Thanks in advance for any ideas/help/advice

gatk tool alignment vcf genome • 111 views
ADD COMMENTlink modified 7 weeks ago by ATpoint22k • written 7 weeks ago by ilee660
1
gravatar for Pierre Lindenbaum
7 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

see changing of chromosome notation in CHROM columns of vcf file

ADD COMMENTlink written 7 weeks ago by Pierre Lindenbaum122k

Thank you Pierre. Exactly what I was looking for.

ADD REPLYlink written 7 weeks ago by ilee660
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 551 users visited in the last hour