Entering edit mode
4.5 years ago
Satyajeet Khare ★ 1.6k
I am performing trio analysis with Exome sequencing data. I am using GATK 3.8. I am at VariantRecalibrator stage. Following is my command...
java -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -input /gatk_out/output.vcf --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /gatk/resources_broad_hg38_v0_hapmap_3.3.hg38.vcf.gz --resource:omni,known=false,training=true,truth=false,prior=12.0 /gatk/resources_broad_hg38_v0_1000G_omni2.5.hg38.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10.0 /gatk/resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /ncbi/00-All.vcf.gz -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -recalFile /gatk_out/output.recal -tranchesFile /gatk_out/output.tranches -rscriptFile /gatk_out/output.plots.R
...which is running into an error
##### ERROR MESSAGE: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
Here, output.vcf is the output of GenotypeGVCFs command run on g.vcf files from mother, father and the child. The hapmap, omni and 1000G files are downloaded from gatk resource. The dbsnp file is downloaded from ncbi (dbsnp file from gatk resource gave an error).
I tried running VariantAnnotator in this manner...
java -jar GenomeAnalysisTK.jar -T VariantAnnotator -A QualByDepth -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -V /gatk_out/output.vcf -o /gatk_out/output_annotated.vcf
...changing output.vcf to output_annotated.vcf in the VariantRecalibrator command.
But now, a new error appears
##### ERROR MESSAGE: Invalid command line: Argument annotation has a bad value: Annotation QD was not found; please check that you have specified the annotation name correctly
Have you validated the input VCFs with ValidateVariants?
Thanks @toralmanvar. I tested the input vcf file like this
The command finished the run without any errors. But the output is confusing. The log file reads following...
But the result file generated in the home directory says
Not sure why 0 records!
Mine is a small dataset (4 samples). Is that the problem? Should I just shift to hard filtration such as this?
That's unusual. You might have gone through this GATK forum, but if not, see whether it can be useful to you.
Thanks, I will try running without
-an QDparameter, but here are a couple of other peculiar observations.
Could this be an incompatibility issue between gatk VCF files and the genome fasta file (
Homo_sapiens.GRCh38.dna.primary_assembly.fa) downloaded from NCBI?
Note: Validate variants works perfectly fine with
00-All.vcf.gzfile which is also downloaded from NCBI and I see millions of processed sites.
Usually, in case of incompatibility of reference genome you end up with this kind of problems. But this not the case here. But give a try using reference genome from this link for 1000G VCF files.
The issue is resolved. I made two changes. I don't know which one worked.
Picardoutput directly for BQSR. I corrected the pipeline by incorporating that step.
Don't know which one worked. Upvoted your VCF validation comment. That gave the clue!