I am a bit confused. I have my vcf files from GATK (exome data) and I want to filter out the low quality variants and extract the variants with minor allele frequency <0.5.
1. I had the idea that qual field >30 or >40 is a good variant but in some papers that use GATK protocol I saw that they use qual<30. which is the correct?
others use jusr vqslod>0. I know that vqslod is the calibrated score.
what do you suggest? filter for both? qual or vqslod?
2. Also the allele frequency derived from gatk (example below) is not the minor allele frequency. right? from where it is derived? also in the example below the AF=1.00. If it is 1.00, how is it a SNP?
chr1 15274 rs201931625 A T 10123.84 PASS AC=56;AF=1.00;AN=56;DB;DP=360;FS=0.000;InbreedingCoeff=-0.0065;MLEAC=55;MLEAF=0.982;MQ=26.38;QD=29.60;SOR=10.341;VQSLOD=20.31;culprit=MQ GT:AD:DP:GQ:PL 1/1:0,29:29:87:836,87,0 1/1:0,8:8:24:244,24,0
I want to extract all the variants that have minor allele frequency <0.05 according to 1000 genome project. How do I do that? Any idea?
3. Last question, in the above example there is the DP in the INFO field and also in the format field for each sample. What is the difference? and when I filter for DP >5, from which field is it taken INFO or FORMAT?
Sorry for the many questions.
Any help is appreciated