Question: Why only False Positives after VQSR ist step ?. Variant Annotation Software for combined g.vcf file !!
gravatar for ravi.uhdnis
2.7 years ago by
United States
ravi.uhdnis170 wrote:

Hi all, I am working with Human sample WGS data (30X, PE reads, HiSeq platform) and facing problem at VQSR and variant annotation step. I ran following commands (modified here, path deleted) to get my files, please see my 2 questions after these commands:

  1. Converted each sample's bam file into gVCF using GATK's HaplotypeCaller (-ERC GVCF function) for joint genotyping and VQSR, command:
  2. /usr/bin/java -jar /opt/GenomeAnalysisTK.jar \
                            -R ucsc.hg19.fasta \
                            -T HaplotypeCaller \
                            -I ETH003388_final.bam \
                            -D rs_snp.vcf \
                            -ERC GVCF \
                            -variant_index_type LINEAR \
                            -variant_index_parameter 128000 \
                            --alleles rs_snp.vcf \
                            --max_alternate_alleles 3 \
                            -L rs_snp.vcf \
                            -ip 150 \
                            -o ETH003388.g.vcf \
                            -stand_call_conf 0 \
                            -stand_emit_conf 0 
    (where rs_snp.vcf is customized dbSNP.vcf with only variants having a SNP rsID).
  3. Combined all my individual sample's g.vcf file to form a cohort.g.vcf, command :
  4. /usr/bin/java -jar /opt/GenomeAnalysisTK.jar \
                        -T CombineGVCFs \
                        -R ucsc.hg19.fasta \
                        -D rs_snp.vcf \
                        -o cohort.g.vcf \
                        --variant sample1.g.vcf \
                        --variant sample2.g.vcf \
                        --variant sample3.g.vcf \
                        ........ \
                        --variant sampleN.g.vcf 
  5. Converted cohort.g.vcf to genotype.g.vcf, command:
  6. /usr/bin/java  -jar /opt/GenomeAnalysisTK.jar \
                    -T GenotypeGVCFs \
                    -R ucsc.hg19.fasta \
                    -D rs_snp.vcf \
                    -allSites \
                    -stand_call_conf 0 \
                    -stand_emit_conf 0 \
                    -maxAltAlleles 3 \
                    --variant cohort.g.vcf \
                    -o genotype.g.vcf 
  7. Running VariantRecalibrator command
  8. /usr/bin/java  -jar /opt/GenomeAnalysisTK.jar \
                        -T VariantRecalibrator \
                        -R ucsc.hg19.fasta \
                        -input genotype.g.vcf \
                        -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.sites.vcf \
                        -resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.hg19.sites.vcf \
                        -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.hg19.sites.vcf \
                        -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 rs_snp.vcf \
                        -an DP \
                        -an QD \
                        -an MQ \
                        -an MQRankSum \
                        -an ReadPosRankSum \
                        -an FS \
                        -an SOR \
                        -an InbreedingCoeff \
                        -mode SNP \
                        -tranche 100.0 \
                        -tranche 99.9 \
                        -tranche 99.0  \
                        -tranche 90.0 \
                        -recalFile recalibrate_SNP.recal \
                        -tranchesFile recalibrate_SNP.tranches \
                        -rscriptFile recalibrate_SNP_plots.R

Question 1: I am getting all my Variants in "False Positive" category in 'recalibrate_SNP.tranches.pdf'. Any idea why all my variants are falling in FP category ?.

Question 2. How can i annotate variants in genotype.g.vcf (of filtered.vcf file after ApplyRecalibration step) in order to get Nonsynonmous SNPs with their respective genotype information in all samples at a time ?. I have done variant annotation of individual sample.vcf files using 'Annovar' but never on combined.g.vcf file, any suggestion will be much appreciated.

ADD COMMENTlink modified 17 months ago by RamRS24k • written 2.7 years ago by ravi.uhdnis170

I cannot immediately spot the error, in case you don't get help here (after waiting a bit) you might give the GATK forum a try

ADD REPLYlink written 2.7 years ago by WouterDeCoster40k

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time. Formatting bar

ADD REPLYlink written 17 months ago by RamRS24k

I don't know... I personally don't in any way believe that these extra steps by the GATK are necessary. You would make your life a whole lot easier by just using samtools mpileup to call SNVs, and pindel to call indels. For annotation, use Variant Effect Predictor.

ADD REPLYlink written 17 months ago by Kevin Blighe48k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1885 users visited in the last hour