Dear All,
I am using the VariantRecalibrator step to create a gaussian mixture model by looking at the annotations over a high quality subset of the input call set and then evaluate the input variants. After running the command I did not get any error message. It ran and I got my output files T_S7998.snps.VarRecal.tranches ,T_S7998.snps.VarRecal.plot.R,T_S7998.snps.VarRecal.tranches.pdf,T_S7998.snps.VarRecal.recal.idx, T_S7998.snps.VarRecal.recal, T_S7998.GATKsnps.VarRecal.log. I got the plot for the tranches which is in the .pdf format but the plot for the T_S7998.snps.VarRecal.plot.R which actually shows the true positive SNPs of my input call set with the training data sets in the gaussian mixture model did not get created. I see in the T_S7998.snps.VarRecal.plot.R file there is a path mentioned about the output plot pdf outputPDF <- "/scratch/GT/vdas/pietro/exome_seq/results/T_S7998/T_S7998_VarRecal/T_S7998.snps.VarRecal.plot.R.pdf". Even I dont see this file. I feel the ggplot2 library which is needed here is not installed in the R module we have in our cluster. I am running this script in the server so I cannot install the library. But can anyone suggest if this is an error of the library being uninstalled or something else. I am providing the command which I used to generate this plot below.
java -Xmx14g -jar /data/PGP/gmelloni/GenomeAnalysisTK-2.3-4-g57ea19f/GenomeAnalysisTK.jar -T VariantRecalibrator -R /scratch/GT/vdas/test_exome/exome/hg19.fa -input /scratch/GT/vdas/pietro/exome_seq/results/T_S7998/T_S7998.GATKsnps.raw.vcf -resource:hapmap,VCF,known=false,training=true,truth=true,prior=15.0 /scratch/GT/vdas/test_exome/exome/databases/hapmap_3.3.hg19.vcf -resource:omni,VCF,known=false,training=true,truth=false,prior=12.0 /scratch/GT/vdas/test_exome/exome/databases/1000G_omni2.5.hg19.vcf -resource:dbsnp,VCF,known=true,training=false,truth=false,prior=8.0 /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ --maxGaussians 4 -mode SNP -log /scratch/GT/vdas/pietro/exome_seq/results/T_S7998/T_S7998_VarRecal/T_S7998.GATKsnps.VarRecal.log -recalFile /scratch/GT/vdas/pietro/exome_seq/results/T_S7998/T_S7998_VarRecal/T_S7998.snps.VarRecal.recal -tranchesFile /scratch/GT/vdas/pietro/exome_seq/results/T_S7998/T_S7998_VarRecal/T_S7998.snps.VarRecal.tranches -rscriptFile /scratch/GT/vdas/pietro/exome_seq/results/T_S7998/T_S7998_VarRecal/T_S7998.snps.VarRecal.plot.R --percentBadVariants 0.05
The command looks good to me and even the resource training sets which I used to model my input call set with the training data set to obtain the true positives. It would be highly appreciated if anybody can give some suggestions to retrieve the plot.
Thanks a lot.
I just wanted to be sure and wanted a second opinion from you guys. But to my despair I actually installed the ggplot2 library in my hoe directory and reran the scripts again hoping that they will create the plot now. But it did not. The T_S7998.snps.VarRecal.plot.R file is created but it does not give any T_S7998.snps.VarRecal.plot.R.pdf in the mentioned path inside the *.R file and the path mentioned is outputPDF <- "/scratch/GT/vdas/pietro/exome_seq/results/T_S7998/T_S7998_VarRecal/T_S7998.snps.VarRecal_2.plot.R.pdf". I am a bit confused as to why this unable to do so now. This script mentioned above creates 2 plots , one is a normal one which got created second time as well and the other one which uses the ggplot2 library did not. Do I have to load the R module before running the script but that should not be needed right? Please let me know if anyone can come up with some suggestions.
or do I have to run the T_S7998.snps.VarRecal.plot.R again once it is being created in the VariantRecalibration step and then generate the plots?
Thanks I managed to load the library locally in my home directory and then rerun the R script and its now generating the plots