Combine a directory of GVCF files with gatk CombineGVCFs
1
2
Entering edit mode
2.4 years ago

I've produced a set of about 400 of GVCF files with gatk HaplotypeCaller, with the -ERC GVCF option. I'd now like to combine them for downstream genotyping and variant recalibration. I believe I can combine with gatk CombineGVCFs.

gatk CombineGVCFs \
   -R reference.fasta \
   --variant sample1.g.vcf.gz \
   --variant sample2.g.vcf.gz \
   -O cohort.g.vcf.gz

But what I don't know, is how to input all my 400 GVCF files into CombineGVCFs. I've heard this can be done with the --arguments_file option, but I don't know how to build such a file?

Any help gratefully received!

gatk combinegvcfs gvcf • 4.1k views
ADD COMMENT
4
Entering edit mode
2.4 years ago

create a file with the .list suffix containing the path to your vcf

find /path/to/dir -type f -name "*.vcf.gz" > input.list

and then use this file with the '-V' argument

(...)    --variant input.list
ADD COMMENT
0
Entering edit mode

Thank you, that seems to allow CombineGVCFs to run.

gatk CombineGVCFs \
   --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' \
   -R "$GKREF"/Homo_sapiens_assembly38.fasta \
   --variant "$OUT"/temp_gvcf_2/input.list \
   -O "$OUT"/temp_gvcf_2/cohort.g.vcf

However, the resulting vcf file seems to only have one sample in it. The columns in the output are as follows:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  M00819_262_000000000-BP6CG_1_
chr1    12046   .   G   <NON_REF>   .   .   END=12410   GT:DP:GQ:MIN_DP:PL  ./.:0:0:0:0,0,0

I'd expected a column of genotypes for each sample (i.e. 400 columns)??

ADD REPLY
0
Entering edit mode

this is unrelated to the original question. Please, validate the answer.

ADD REPLY
0
Entering edit mode

Done thank you

ADD REPLY

Login before adding your answer.

Traffic: 2657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6