Question: Picard SortVcf Error Message
1
gravatar for haiying.kong
3.4 years ago by
haiying.kong230
Germany
haiying.kong230 wrote:

I have downloaded SNP database from 1000Genome, and if I try to sort the vcf file, I get error message that says Chromosome name is 1, but it is expected to be chr1. In fact, in my vcf file, it is chr1.

Here is the command line:

java -Xms10g -Xmx20g -Djava.io.tmpdir=tmp -jar ${Picard}picard.jar SortVcf INPUT=ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr.vcf OUTPUT=ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr_sorted.vcf SEQUENCE_DICTIONARY=${hg38}hg38.dict

Here is the log:

[Tue Sep 22 17:04:12 CEST 2015] picard.vcf.SortVcf INPUT=[ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr.vcf] OUTPUT=ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr_sorted.vcf SEQUENCE_DICTIONARY=/home/kong/Haiying/Reference/hg38/hg38.dict    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=true CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue Sep 22 17:04:12 CEST 2015] Executing as kong@hpc22 on Linux 3.2.53-1.el5.elrepo amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27; Picard version: 1.138(aa51703435dc6a423013e74e56b0b68405facd79_1439324166) IntelDeflater
[Tue Sep 22 17:04:12 CEST 2015] picard.vcf.SortVcf done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=10290200576
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=1,length=249250621,dict_index=0,assembly=b37) was found when SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=null) was expected.
        at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:112)
        at picard.vcf.SortVcf.doWork(SortVcf.java:81)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=1,length=249250621,dict_index=0,assembly=b37) was found when SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=null) was expected.
        at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:165)
        at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:110)
        ... 4 more

 

And here is part of the vcf file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    10177   rs367896724     A       AC      100     PASS    AC=2130;AF=0.425319;AN=5008;NS=2504;DP=103152;EAS_AF=0.3363;AMR_AF=0.3602;AFR_AF=0.4909;EUR_AF=0.4056;SAS_AF=0.4949;AA=|||unknown(NO_COVERAGE);VT=INDEL
chr1    10235   rs540431307     T       TA      100     PASS    AC=6;AF=0.00119808;AN=5008;NS=2504;DP=78015;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0;EUR_AF=0;SAS_AF=0.0051;AA=|||unknown(NO_COVERAGE);VT=INDEL
chr1    10352   rs555500075     T       TA      100     PASS    AC=2191;AF=0.4375;AN=5008;NS=2504;DP=88915;EAS_AF=0.4306;AMR_AF=0.4107;AFR_AF=0.4788;EUR_AF=0.4264;SAS_AF=0.4192;AA=|||unknown(NO_COVERAGE);VT=INDEL
chr1    10505   rs548419688     A       T       100     PASS    AC=1;AF=0.000199681;AN=5008;NS=2504;DP=9632;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
chr1    10506   rs568405545     C       G       100     PASS    AC=1;AF=0.000199681;AN=5008;NS=2504;DP=9676;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
chr1    10511   rs534229142     G       A       100     PASS    AC=1;AF=0.000199681;AN=5008;NS=2504;DP=9869;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP

 

 

software error • 2.8k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by haiying.kong230

you know what is the problem. What is your real question ?

ADD REPLYlink written 3.4 years ago by Pierre Lindenbaum116k

I don't know what the problem is.

In my vcf file, the chromosomes are like chr1, chr2, .......

But if I run the command, I keep getting the error message that says 

Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=1,length=249250621,dict_index=0,assembly=b37) was found when SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=null) was expected.

ADD REPLYlink written 3.4 years ago by haiying.kong230

this is a common problem: your vcf have a  prefix in the chromosome names. e.g: VCF files: Change Chromosome Notation

ADD REPLYlink written 3.4 years ago by Pierre Lindenbaum116k

I actually did add chr to the chromosome name with sed. So my vcf file and dict file both are having chromosome names like 

chr1, chr2, chr3, chr4, .......

ADD REPLYlink written 3.4 years ago by haiying.kong230

Hello haiying.kong!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=62900

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 3.4 years ago by Pierre Lindenbaum116k

I did not realize this can be annoying. I was trying to get some response as many as possible.

I tried to delete the one on SEQanswers, but it seems undeletable. So I made it short refer.

ADD REPLYlink written 3.4 years ago by haiying.kong230
0
gravatar for haiying.kong
3.4 years ago by
haiying.kong230
Germany
haiying.kong230 wrote:

It turned out that the reference genome data I was using is too updated, and it is not supported by GATK or other Broad Institute software tools.

To be safe, the best way of doing this is using all reference data file downloaded from 

ftp://ftp.broadinstitute.org/bundle/2.8/b37/

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by haiying.kong230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1511 users visited in the last hour