error with CombineGVCFs to join large number of files
1
0
Entering edit mode
12 months ago
rj.rezwan • 0

Hi, I have 64 different accessions files having *g.vcf output files after haplotyepcalling in GATK. Now I want to combine them in one file, and I am using CombineGVCFs but unable to get the output and file showing error. Can someone suggest how to combine them in the best smart way or is there any issue with combineGCVF for combining many samples files? the code is here

#!/bin/bash
#
#SBATCH --job-name=combine_files
#SBATCH --output=combine_files.%j.out
#SBATCH --partition=batch
#SBATCH --cpus-per-task=20
#SBATCH --time=100:00:00
#SBATCH --mem=600G

module load gatk/4.1.2.0

ref_dir=(~/path/PitayaGenomic.fa)

gvcf_dir=(~/path/*.g.vcf)

gatk CombineGVCFs -R ${ref_dir} $(printf -- '--variant %s ' "${gvcf_dir[@]}") -O joint_files.g.vcf
haplotypecalling CombineGVCFs GATK • 961 views
ADD COMMENT
2
Entering edit mode

Without error message, it is hard to say where the problem is. You could try increasing memory (--java-options "-Xmx50g" for example). Just as a remark, GATK recommends using GenomicDBImport instead of CombineGVCFs especially when dealing with large numbers of files (more than 1000). But I don't think it is your case here.

ADD REPLY
0
Entering edit mode

sure, I will make a comment ASAP within a few days with some suitable answers which would be useful for new users

ADD REPLY
0
Entering edit mode

Not just a comment. Validate (green mark on the left) the good answers for your previous questions.

ADD REPLY
2
Entering edit mode
12 months ago

instead of

gvcf_dir=(~/path/*.g.vcf)
gatk CombineGVCFs -R ${ref_dir} $(printf -- '--variant %s ' "${gvcf_dir[@]}") -O joint_files.g.vcf

do

find ~/path/ -type f -name "*.g.vcf" > gvcfs.list
gatk CombineGVCFs -R "${ref_dir}"  --variant gvcfs.list -O joint_files.g.vcf.gz
ADD COMMENT

Login before adding your answer.

Traffic: 2286 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6