Question: Why only False Positives after VQSR ist step ?. Variant Annotation Software for combined g.vcf file !!
gravatar for ravi.uhdnis
14 months ago by
United States
ravi.uhdnis140 wrote:

Hi all, I am working with Human sample WGS data (30X, PE reads, HiSeq platform) and facing problem at VQSR and variant annotation step. I ran following commands (modified here, path deleted) to get my files, please see my 2 questions after these commands:

  1. Converted each sample's bam file into gVCF using GATK's HaplotypeCaller (-ERC GVCF function) for joint genotyping and VQSR, command:

/usr/bin/java -jar /opt/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T HaplotypeCaller -I ETH003388_final.bam -D rs_snp.vcf -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 --alleles rs_snp.vcf --max_alternate_alleles 3 -L rs_snp.vcf -ip 150 -o ETH003388.g.vcf -stand_call_conf 0 -stand_emit_conf 0

(where rs_snp.vcf is customized dbSNP.vcf with only variants having a SNP rsID).

  1. Combined all my individual sample's g.vcf file to form a cohort.g.vcf, command :

/usr/bin/java -jar /opt/GenomeAnalysisTK.jar -T CombineGVCFs -R ucsc.hg19.fasta -D rs_snp.vcf -o cohort.g.vcf --variant sample1.g.vcf --variant sample2.g.vcf --variant sample3.g.vcf ........ --variant sampleN.g.vcf

  1. Converted cohort.g.vcf to genotype.g.vcf, command:

/usr/bin/java -jar /opt/GenomeAnalysisTK.jar -T GenotypeGVCFs -R ucsc.hg19.fasta -D rs_snp.vcf -allSites -stand_call_conf 0 -stand_emit_conf 0 -maxAltAlleles 3 --variant cohort.g.vcf -o genotype.g.vcf

  1. Running VariantRecalibrator command

/usr/bin/java -jar /opt/GenomeAnalysisTK.jar -T VariantRecalibrator -R ucsc.hg19.fasta -input genotype.g.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.sites.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.hg19.sites.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 rs_snp.vcf -an DP -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -rscriptFile recalibrate_SNP_plots.R

Question 1: I am getting all my Variants in "False Positive" category in 'recalibrate_SNP.tranches.pdf'. Any idea why all my variants are falling in FP category ?.

Question 2. How can i annotate variants in genotype.g.vcf (of filtered.vcf file after ApplyRecalibration step) in order to get Nonsynonmous SNPs with their respective genotype information in all samples at a time ?. I have done variant annotation of individual sample.vcf files using 'Annovar' but never on combined.g.vcf file, any suggestion will be much appreciated.

ADD COMMENTlink modified 14 months ago • written 14 months ago by ravi.uhdnis140

I cannot immediately spot the error, in case you don't get help here (after waiting a bit) you might give the GATK forum a try

ADD REPLYlink written 14 months ago by WouterDeCoster26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1556 users visited in the last hour