Can GATK read compressed vcf files?
0
0
Entering edit mode
7.6 years ago
Apprentice ▴ 160

Hi!

I have one question about GATK CombineVariants.

Could you tell me whether GATK CombineVariants can load compressed vcf files?

Below command returned error.

java -jar GenomeAnalysisTK.jar   -T CombineVariants   -R reference.fasta   --variant input1.vcf.gz  --variant input2.vcf.gz   -o output.vcf
sequence SNP genome • 6.1k views
ADD COMMENT
0
Entering edit mode

What error did you get ?

ADD REPLY
0
Entering edit mode

The message is shown as below;

------------------------------------------------------------------------------------------
Done. There were 2 WARN messages, the first 2 are repeated below.
WARN  18:47:51,038 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, sjavascript:document.forms["comment-form"].submit()kipping dictionary validation 
WARN  18:47:51,039 IndexDictionaryUtils - Track variant2 doesn't have a sequence dictionary built in, skipping dictionary validation 
------------------------------------------------------------------------------------------
ADD REPLY
1
Entering edit mode

I'm afraid it's not a vcf file. It looks like a file downloaded from the web. Anyway, vcf.gz files must be compressed with bgzip and indexed with tabix.

what's the output of :

file input1.vcf.gz   input2.vcf.gz

?

ADD REPLY
0
Entering edit mode

The output was

input1.vcf.gz: gzip compressed data, extra field input2.vcf.gz: gzip compressed data, extra field

ADD REPLY
0
Entering edit mode

ok, so it's correct. It's clearly a set of bzipped files.

Can you now show me the output of:

gunzip -c input1.vcf.gz | grep -v "#" | head -n 2
ADD REPLY

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6