Entering edit mode
9.5 years ago
haiying.kong
▴
360
I have downloaded SNP database from 1000Genome, and if I try to sort the vcf file, I get error message that says Chromosome name is 1, but it is expected to be chr1. In fact, in my vcf file, it is chr1.
Here is the command line:
java -Xms10g -Xmx20g \
-Djava.io.tmpdir=tmp \
-jar ${Picard}picard.jar SortVcf \
INPUT=ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr.vcf \
OUTPUT=ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr_sorted.vcf \
SEQUENCE_DICTIONARY=${hg38}hg38.dict
Here is the log:
[Tue Sep 22 17:04:12 CEST 2015] picard.vcf.SortVcf INPUT=[ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr.vcf] OUTPUT=ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites_chr_sorted.vcf SEQUENCE_DICTIONARY=/home/kong/Haiying/Reference/hg38/hg38.dict VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=true CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue Sep 22 17:04:12 CEST 2015] Executing as kong@hpc22 on Linux 3.2.53-1.el5.elrepo amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27; Picard version: 1.138(aa51703435dc6a423013e74e56b0b68405facd79_1439324166) IntelDeflater
[Tue Sep 22 17:04:12 CEST 2015] picard.vcf.SortVcf done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=10290200576
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=1,length=249250621,dict_index=0,assembly=b37) was found when SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=null) was expected.
at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:112)
at picard.vcf.SortVcf.doWork(SortVcf.java:81)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=1,length=249250621,dict_index=0,assembly=b37) was found when SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=null) was expected.
at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:165)
at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:110)
... 4 more
And here is part of the vcf file:
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 10177 rs367896724 A AC 100 PASS AC=2130;AF=0.425319;AN=5008;NS=2504;DP=103152;EAS_AF=0.3363;AMR_AF=0.3602;AFR_AF=0.4909;EUR_AF=0.4056;SAS_AF=0.4949;AA=|||unknown(NO_COVERAGE);VT=INDEL
chr1 10235 rs540431307 T TA 100 PASS AC=6;AF=0.00119808;AN=5008;NS=2504;DP=78015;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0;EUR_AF=0;SAS_AF=0.0051;AA=|||unknown(NO_COVERAGE);VT=INDEL
chr1 10352 rs555500075 T TA 100 PASS AC=2191;AF=0.4375;AN=5008;NS=2504;DP=88915;EAS_AF=0.4306;AMR_AF=0.4107;AFR_AF=0.4788;EUR_AF=0.4264;SAS_AF=0.4192;AA=|||unknown(NO_COVERAGE);VT=INDEL
chr1 10505 rs548419688 A T 100 PASS AC=1;AF=0.000199681;AN=5008;NS=2504;DP=9632;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
chr1 10506 rs568405545 C G 100 PASS AC=1;AF=0.000199681;AN=5008;NS=2504;DP=9676;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
chr1 10511 rs534229142 G A 100 PASS AC=1;AF=0.000199681;AN=5008;NS=2504;DP=9869;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
You know what the problem is. What is your real question?
I don't know what the problem is.
In my vcf file, the chromosomes are like chr1, chr2, .......
But if I run the command, I keep getting the error message that says
This is a common problem: Your vcf has a prefix in the chromosome names. e.g: VCF files: Change Chromosome Notation
I actually did add chr to the chromosome name with sed. So my vcf file and dict file both are having chromosome names like
chr1, chr2, chr3, chr4, .......
Hello haiying.kong!
It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=62900
This is typically not recommended as it runs the risk of annoying people in both communities.
I did not realize this can be annoying. I was trying to get some response as many as possible.
I tried to delete the one on SEQanswers, but it seems undeletable. So I made it short refer.