I am learning how an NGS workflow works. I am using GATK3 at the moment and I have the following issue
After calling variants, I extract SNP, INDEL and MIXED (no problem until here) but when I try to combine filtered variants with CombineVariants I got the error
##### ERROR MESSAGE: CombineVariants should not be used to merge gVCFs produced by the HaplotypeCaller; use CombineGVCFs
I have checked the 3 vcf files inputs (SNP.vcf, INDELs.vcf and MIXED.vcf) and none of them is a gvcf. All coming from the same sample. I am running this with one patient sample.
Why this error?? On the other hand, if I follow the recommendation and use CombineGVCFs, another error is found telling me that genotypemergeoption is not define when it really is.
I have pasted here my code from HaplotypeCaller to CombineVariants in case the error is lines above when vcf files are created.
Note: 1 There is no variants in SNP.vcf and INDELs.vcf with the data I am using but there is many mixed variants in the MIXED file.
Note: 2 I know GATK3 is not the last version I am just learning with the version we currently use in the lab but we are now implementing GATK4
# Variant calling with Haplotypecaller
java -Xmx4000m "$javatmp" -jar "$gatkpath" \
-T HaplotypeCaller \
-R "$refgenome4HaplotypeCaller" \
-dbsnp "$dbsnpfile" \
-I ../final_bam.bam \
-L "$bed4HaplotypeCaller" \
-o ../vcf.vcf \
--genotyping_mode DISCOVERY \
-stand_call_conf 30 \
--emitRefConfidence BP_RESOLUTION \
--variant_index_type LINEAR \
--variant_index_parameter 128000 \
-ip 100 \
-dt NONE
I do the following processings steps
# Extract SNPs
java -Xmx8000m "$javatmp" -jar "$gatkpath" \
-T SelectVariants \
-R "$refgenome4HaplotypeCaller" \
-V ../vcf.vcf \
-L "$bed4HaplotypeCaller" \
-ip 100 \
-o ../SNP.vcf \
-selectType SNP \
-dt NONE
# Filter SNPs
java -Xmx2000m "$javatmp" -jar "$gatkpath" \
-T VariantFiltration \
-R "$refgenome4HaplotypeCaller" \
-V ../SNP.vcf \
-L "$bed4HaplotypeCaller" \
-ip 100 \
-o ../SNP_Filtered.vcf \
--filterExpression "QD < 2.0" \
--filterName "LowQD" \
--filterExpression "FS > 60.0" \
--filterName "SB" \
--filterExpression "MQ < 40.0" \
--filterName "LowMQ" \
--filterExpression "MQRankSum < -12.5" \
--filterName "MQRankSum" \
--filterExpression "ReadPosRankSum < -8.0" \
--filterName "ReadPosRankSum" \
-dt NONE
# Extract INDELs
java -Xmx2000m "$javatmp" -jar "$gatkpath" \
-T SelectVariants \
-R "$refgenome4HaplotypeCaller" \
-V ../vcf.vcf \
-L "$bed4HaplotypeCaller" \
-ip 100 \
-o ../INDELs.vcf \
-selectType INDEL \
-dt NONE
# Filter INDELs
java -Xmx2000m "$javatmp" -jar "$gatkpath" \
-T VariantFiltration \
-R "$refgenome4HaplotypeCaller" \
-V ../INDELs.vcf \
-L "$bed4HaplotypeCaller" \
-ip 100 \
-o ../INDELs_Filtered.vcf \
--filterExpression "QD < 2.0" \
--filterName "LowQD" \
--filterExpression "FS > 200.0" \
--filterName "SB" \
--filterExpression "ReadPosRankSum < -20.0" \
--filterName "ReadPosRankSum" \
-dt NONE
# Extract MIXED
java -Xmx2000m "$javatmp" -jar "$gatkpath" \
-T SelectVariants \
-R "$refgenome4HaplotypeCaller" \
-V ../vcf.vcf \
-L "$bed4HaplotypeCaller" \
-ip 100 \
-o ../MIXED.vcf \
-selectType MIXED \
-dt NONE
# Filter MIXED
java -Xmx2000m "$javatmp" -jar "$gatkpath" \
-T VariantFiltration \
-R "$refgenome4HaplotypeCaller" \
-V ../MIXED.vcf \
-L "$bed4HaplotypeCaller" \
-ip 100 \
-o ../MIXED_Filtered.vcf \
--filterExpression "QD < 2.0" \
--filterName "LowQD" \
--filterExpression "FS > 60.0" \
--filterName "SB" \
-dt NONE
# Combine filtered variants
java -Xmx2000m "$javatmp" -jar "$gatkpath" \
-T CombineVariants \
-R "$refgenome4HaplotypeCaller" \
-o ../Combined.vcf \
--variant ../SNP_Filtered.vcf \
--variant ../INDELs_Filtered.vcf \
--variant ../MIXED_Filtered.vcf \
-dt NONE \
--genotypemergeoption UNSORTED