Why only False Positives after VQSR ist step ?. Variant Annotation Software for combined g.vcf file !!
0
0
Entering edit mode
7.3 years ago
ravi.uhdnis ▴ 220

Hi all, I am working with Human sample WGS data (30X, PE reads, HiSeq platform) and facing problem at VQSR and variant annotation step. I ran following commands (modified here, path deleted) to get my files, please see my 2 questions after these commands:

  1. Converted each sample's bam file into gVCF using GATK's HaplotypeCaller (-ERC GVCF function) for joint genotyping and VQSR, command:
  2. /usr/bin/java -jar /opt/GenomeAnalysisTK.jar \
                            -R ucsc.hg19.fasta \
                            -T HaplotypeCaller \
                            -I ETH003388_final.bam \
                            -D rs_snp.vcf \
                            -ERC GVCF \
                            -variant_index_type LINEAR \
                            -variant_index_parameter 128000 \
                            --alleles rs_snp.vcf \
                            --max_alternate_alleles 3 \
                            -L rs_snp.vcf \
                            -ip 150 \
                            -o ETH003388.g.vcf \
                            -stand_call_conf 0 \
                            -stand_emit_conf 0 
    
    (where rs_snp.vcf is customized dbSNP.vcf with only variants having a SNP rsID).
  3. Combined all my individual sample's g.vcf file to form a cohort.g.vcf, command :
  4. /usr/bin/java -jar /opt/GenomeAnalysisTK.jar \
                        -T CombineGVCFs \
                        -R ucsc.hg19.fasta \
                        -D rs_snp.vcf \
                        -o cohort.g.vcf \
                        --variant sample1.g.vcf \
                        --variant sample2.g.vcf \
                        --variant sample3.g.vcf \
                        ........ \
                        --variant sampleN.g.vcf 
    
  5. Converted cohort.g.vcf to genotype.g.vcf, command:
  6. /usr/bin/java  -jar /opt/GenomeAnalysisTK.jar \
                    -T GenotypeGVCFs \
                    -R ucsc.hg19.fasta \
                    -D rs_snp.vcf \
                    -allSites \
                    -stand_call_conf 0 \
                    -stand_emit_conf 0 \
                    -maxAltAlleles 3 \
                    --variant cohort.g.vcf \
                    -o genotype.g.vcf 
    
  7. Running VariantRecalibrator command
  8. /usr/bin/java  -jar /opt/GenomeAnalysisTK.jar \
                        -T VariantRecalibrator \
                        -R ucsc.hg19.fasta \
                        -input genotype.g.vcf \
                        -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.sites.vcf \
                        -resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.hg19.sites.vcf \
                        -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.hg19.sites.vcf \
                        -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 rs_snp.vcf \
                        -an DP \
                        -an QD \
                        -an MQ \
                        -an MQRankSum \
                        -an ReadPosRankSum \
                        -an FS \
                        -an SOR \
                        -an InbreedingCoeff \
                        -mode SNP \
                        -tranche 100.0 \
                        -tranche 99.9 \
                        -tranche 99.0  \
                        -tranche 90.0 \
                        -recalFile recalibrate_SNP.recal \
                        -tranchesFile recalibrate_SNP.tranches \
                        -rscriptFile recalibrate_SNP_plots.R
    

Question 1: I am getting all my Variants in "False Positive" category in 'recalibrate_SNP.tranches.pdf'. Any idea why all my variants are falling in FP category ?.

Question 2. How can i annotate variants in genotype.g.vcf (of filtered.vcf file after ApplyRecalibration step) in order to get Nonsynonmous SNPs with their respective genotype information in all samples at a time ?. I have done variant annotation of individual sample.vcf files using 'Annovar' but never on combined.g.vcf file, any suggestion will be much appreciated.

SNP next-gen sequencing Assembly genome • 3.1k views
ADD COMMENT
0
Entering edit mode

I cannot immediately spot the error, in case you don't get help here (after waiting a bit) you might give the GATK forum a try

ADD REPLY
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time. Formatting bar

ADD REPLY
0
Entering edit mode

I don't know... I personally don't in any way believe that these extra steps by the GATK are necessary. You would make your life a whole lot easier by just using samtools mpileup to call SNVs, and pindel to call indels. For annotation, use Variant Effect Predictor.

ADD REPLY

Login before adding your answer.

Traffic: 1875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6