Gatk Variantrecalibrator Is Aborted By Some Problems.
1
0
Entering edit mode
11.9 years ago
Chris ▴ 40

Hi @all! It is a question about GATK VariantRecalibrator.

The data I use containing 50 simples at 15X average exome sequencing. Everything seems well at the beginning. But Errors come out in the end:

<h5>ERROR MESSAGE: NaN LOD value assigned. Clustering with this few variants and these annotations is unsafe. Please consider raising the number of variants used to train the negative model (via --percentBadVariants 0.05, for example) or lowering the maximum number of Gaussians to use in the model (via --maxGaussians 4, for example)</h5>

The Command I used: java -Xmx1555m -jar /home/chris/install/GenomeAnalysisTK-1.6-9-g47df7bb/GenomeAnalysisTK.jar -R /home/chris/data/hg/ucsc.hg19.fasta -T VariantRecalibrator -input /home/chris/data/train/SRR_50bam.raw.l.new.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /home/chris/data/train/hapmap_3.3.hg19.sites.vcf -resource:omni,known=false,training=true,true=false,prior=12.0 /home/chris/data/train/1000G_omni2.5.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 /home/chris/data/hg/dbsnp_135.hg19.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff -an FS -an DP -an MQ -an InbreedingCoeff -recalFile /home/chris/SRR_50bam.recal -tranchesFile /home/chris/SRR_50bam.tranches -rscriptFile /home/chris/plots.R -nt 2 -mG 4 -percentBad 0.05 -L /home/chris/data/train/exome.bed

Notice that I have already used -mG 4 -percentBad 0.05 parameters. a INFO said: INFO 11:49:45,070 VariantDataManager - Additionally training with worst 5.000% of passing data --> 3942 variants with LOD <= 0.0000.

Somewhere said that it seems like it was the negative model using the worst X percent of variants is too low. But when I change the -percentBad to 0.15, error still appears. The INFO: INFO 13:10:13,059 VariantDataManager - Additionally training with worst 15.000% of passing data --> 11825 variants with LOD <= 0.0000. The LOD is still 0. I don't know why process can't complete. And what's LOD? My raw VCF called almost 80000 SNPs. It's really the 3942 or 11825 variants not enough?

Here is a simple sample in my raw VCF: chr1 881627 rs2272757 G A 1249.50 . AC=52;AF=0.650;AN=80;BaseQRankSum=4.484;DB;DP=280;Dels=0.00;FS=0.000;HRun=1;HaplotypeScore=0.4991;InbreedingCoeff=0.1468;MQ=34.16;MQ0=5;MQRankSum=-1.772;QD=5.98;ReadPosRankSum=0.671;SB=-629.98 GT:AD:DP:GQ:PL 0/1:3,5:8:64.89:68,0,65 ./. ........... 0/0:2,0:2:3:0,3,25 chr1 881784 . C T 124.05 . AC=2;AF=0.021;AN=96;BaseQRankSum=0.481;DP=430;Dels=0.00;FS=17.640;HRun=1;HaplotypeScore=0.8485;InbreedingCoeff=-0.0583;MQ=39.06;MQ0=3;MQRankSum=-0.937;QD=4.00;ReadPosRankSum=-0.505;SB=-2.11 GT:AD:DP:GQ:PL 0/0:8,0:8:21.03:0,21,201 ./. ..............0/0:4,0:4:6.01:0,6,61

I am so sad! Look forward to your reply!

gatk • 5.0k views
ADD COMMENT
0
Entering edit mode
10.9 years ago

just not to leave this question unanswered, this has been covered Gatk Variantrecalibrator Error Message.

ADD COMMENT

Login before adding your answer.

Traffic: 2580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6