Picard SortVcf Error Message
1
1
Entering edit mode
8.6 years ago
haiying.kong ▴ 360

I have downloaded SNP database from 1000Genome, and if I try to sort the vcf file, I get error message that says Chromosome name is 1, but it is expected to be chr1. In fact, in my vcf file, it is chr1.

Here is the command line:

java -Xms10g -Xmx20g \
  -Djava.io.tmpdir=tmp \
  -jar ${Picard}picard.jar SortVcf \
  INPUT=ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr.vcf \
  OUTPUT=ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr_sorted.vcf \
  SEQUENCE_DICTIONARY=${hg38}hg38.dict

Here is the log:

[Tue Sep 22 17:04:12 CEST 2015] picard.vcf.SortVcf INPUT=[ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr.vcf] OUTPUT=ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr_sorted.vcf SEQUENCE_DICTIONARY=/home/kong/Haiying/Reference/hg38/hg38.dict    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=true CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue Sep 22 17:04:12 CEST 2015] Executing as kong@hpc22 on Linux 3.2.53-1.el5.elrepo amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27; Picard version: 1.138(aa51703435dc6a423013e74e56b0b68405facd79_1439324166) IntelDeflater
[Tue Sep 22 17:04:12 CEST 2015] picard.vcf.SortVcf done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=10290200576
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=1,length=249250621,dict_index=0,assembly=b37) was found when SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=null) was expected.
        at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:112)
        at picard.vcf.SortVcf.doWork(SortVcf.java:81)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=1,length=249250621,dict_index=0,assembly=b37) was found when SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=null) was expected.
        at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:165)
        at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:110)
        ... 4 more

And here is part of the vcf file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    10177   rs367896724     A       AC      100     PASS    AC=2130;AF=0.425319;AN=5008;NS=2504;DP=103152;EAS_AF=0.3363;AMR_AF=0.3602;AFR_AF=0.4909;EUR_AF=0.4056;SAS_AF=0.4949;AA=|||unknown(NO_COVERAGE);VT=INDEL
chr1    10235   rs540431307     T       TA      100     PASS    AC=6;AF=0.00119808;AN=5008;NS=2504;DP=78015;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0;EUR_AF=0;SAS_AF=0.0051;AA=|||unknown(NO_COVERAGE);VT=INDEL
chr1    10352   rs555500075     T       TA      100     PASS    AC=2191;AF=0.4375;AN=5008;NS=2504;DP=88915;EAS_AF=0.4306;AMR_AF=0.4107;AFR_AF=0.4788;EUR_AF=0.4264;SAS_AF=0.4192;AA=|||unknown(NO_COVERAGE);VT=INDEL
chr1    10505   rs548419688     A       T       100     PASS    AC=1;AF=0.000199681;AN=5008;NS=2504;DP=9632;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
chr1    10506   rs568405545     C       G       100     PASS    AC=1;AF=0.000199681;AN=5008;NS=2504;DP=9676;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
chr1    10511   rs534229142     G       A       100     PASS    AC=1;AF=0.000199681;AN=5008;NS=2504;DP=9869;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
software-error picard • 5.4k views
ADD COMMENT
0
Entering edit mode

You know what the problem is. What is your real question?

ADD REPLY
0
Entering edit mode

I don't know what the problem is.

In my vcf file, the chromosomes are like chr1, chr2, .......

But if I run the command, I keep getting the error message that says

Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=1,length=249250621,dict_index=0,assembly=b37) was found when SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=null) was expected.
ADD REPLY
0
Entering edit mode

This is a common problem: Your vcf has a prefix in the chromosome names. e.g: VCF files: Change Chromosome Notation

ADD REPLY
0
Entering edit mode

I actually did add chr to the chromosome name with sed. So my vcf file and dict file both are having chromosome names like

chr1, chr2, chr3, chr4, .......

ADD REPLY
0
Entering edit mode

Hello haiying.kong!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=62900

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

I did not realize this can be annoying. I was trying to get some response as many as possible.

I tried to delete the one on SEQanswers, but it seems undeletable. So I made it short refer.

ADD REPLY
0
Entering edit mode
8.6 years ago
haiying.kong ▴ 360

It turned out that the reference genome data I was using is too updated, and it is not supported by GATK or other Broad Institute software tools.

To be safe, the best way of doing this is using all reference data file downloaded from ftp://ftp.broadinstitute.org/bundle/2.8/b37/

ADD COMMENT

Login before adding your answer.

Traffic: 2589 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6