Question: ERROR MESSAGE: QD annotation at VariantRecalibrator
0
gravatar for Satyajeet Khare
11 months ago by
Satyajeet Khare1.5k
Pune, India
Satyajeet Khare1.5k wrote:

I am performing trio analysis with Exome sequencing data. I am using GATK 3.8. I am at VariantRecalibrator stage. Following is my command...

java -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -input /gatk_out/output.vcf --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /gatk/resources_broad_hg38_v0_hapmap_3.3.hg38.vcf.gz --resource:omni,known=false,training=true,truth=false,prior=12.0 /gatk/resources_broad_hg38_v0_1000G_omni2.5.hg38.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10.0 /gatk/resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /ncbi/00-All.vcf.gz -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -recalFile /gatk_out/output.recal -tranchesFile /gatk_out/output.tranches -rscriptFile /gatk_out/output.plots.R

...which is running into an error

##### ERROR MESSAGE: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.

Here, output.vcf is the output of GenotypeGVCFs command run on g.vcf files from mother, father and the child. The hapmap, omni and 1000G files are downloaded from gatk resource. The dbsnp file is downloaded from ncbi (dbsnp file from gatk resource gave an error).

I tried running VariantAnnotator in this manner...

 java -jar GenomeAnalysisTK.jar -T VariantAnnotator -A QualByDepth -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -V /gatk_out/output.vcf -o /gatk_out/output_annotated.vcf

...changing output.vcf to output_annotated.vcf in the VariantRecalibrator command.

But now, a new error appears

##### ERROR MESSAGE: Invalid command line: Argument annotation has a bad value: Annotation QD was not found; please check that you have specified the annotation name correctly

Any inputs?

snp variantrecalibrator gatk • 713 views
ADD COMMENTlink modified 10 months ago by Biostar ♦♦ 20 • written 11 months ago by Satyajeet Khare1.5k
1

Have you validated the input VCFs with ValidateVariants?

ADD REPLYlink written 11 months ago by toralmanvar820

Thanks @toralmanvar. I tested the input vcf file like this

java -jar GenomeAnalysisTK.jar -T ValidateVariants --reference_window_stop >= 300 -R Homo_sapiens.GRCh38.dna.primary_assembly.fa --variant:VCF gatk_out/output_annotated.vcf

The command finished the run without any errors. But the output is confusing. The log file reads following...

INFO  09:54:43,442 ProgressMeter -          | processed |    time |    per 1M |           |   total | remaining
INFO  09:54:43,443 ProgressMeter -           Location |     sites | elapsed |     sites | completed | runtime |   runtime
INFO  09:55:01,039 ProgressMeter -              done   1389871.0    17.0 s      12.0 s      100.0%    17.0 s       0.0 s
INFO  09:55:01,039 ProgressMeter - Total runtime 17.60 secs, 0.29 min, 0.00 hours

But the result file generated in the home directory says

Successfully validated the input file.  Checked 0 records with no failures.

Not sure why 0 records!

ADD REPLYlink modified 10 months ago by RamRS24k • written 11 months ago by Satyajeet Khare1.5k

Mine is a small dataset (4 samples). Is that the problem? Should I just shift to hard filtration such as this?

QD<2.0||MQ<40.0||FS>60.0||MQRankSum<-12.5||ReadPosRankSum<-8.0
ADD REPLYlink modified 10 months ago by RamRS24k • written 11 months ago by Satyajeet Khare1.5k

That's unusual. You might have gone through this GATK forum, but if not, see whether it can be useful to you.

ADD REPLYlink written 11 months ago by toralmanvar820

Thanks, I will try running without -an QD parameter, but here are a couple of other peculiar observations.

  1. I do see QD annotations in my VCF file, even without running the VariantAnnotator also.
  2. If I try and validate the resource VCF files from gatk resource bundle (1000G and hapmap), the Processed sites are shown as 0 (fasta file is from NCBI).

Could this be an incompatibility issue between gatk VCF files and the genome fasta file (Homo_sapiens.GRCh38.dna.primary_assembly.fa) downloaded from NCBI?

Note: Validate variants works perfectly fine with 00-All.vcf.gz file which is also downloaded from NCBI and I see millions of processed sites.

ADD REPLYlink modified 11 months ago • written 11 months ago by Satyajeet Khare1.5k

Usually, in case of incompatibility of reference genome you end up with this kind of problems. But this not the case here. But give a try using reference genome from this link for 1000G VCF files.

ADD REPLYlink written 11 months ago by toralmanvar820
2

The issue is resolved. I made two changes. I don't know which one worked.

  1. I had skipped the local realignment step earlier. That is, I had used the Picard output directly for BQSR. I corrected the pipeline by incorporating that step.
  2. I used genome.fasta and all .vcf files from gatk resource bundle to avoid compatibility issues.

Don't know which one worked. Upvoted your VCF validation comment. That gave the clue!

ADD REPLYlink modified 11 months ago • written 11 months ago by Satyajeet Khare1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1716 users visited in the last hour