Hi all, I have a vcf that I made by followign GATK best practices workflow and I filtered genotypes with low GQ < 20. However I understand that they are not removed instead they are tagged as "FILTER_GQ_20" in my vcf.
gatk VariantFiltration \ -V all_jointcalls_sRecal_allPASS_PP.vcf \ -G-filter "GQ < 20" -G-filter-name "FILTER_GQ-20" \ -O all_jointcalls_sRecal_allPASS_PP2.vcf
I tried to remove all rows with FILTER_GQ-20 by doing a simple grep:
cat all_jointcalls_sRecal_allPASS_PP2.vcf | grep -v "FILTER_GQ-20" > all_jointcalls_sRecal_allPASS_GQ20orhiger.vcf
THen I checked to see how many are present that are good ,GQ>20
cat all_jointcalls_sRecal_allPASS_GQ20orhiger.vcf | wc -l 212298
This seems way low when compared to the original vcf from Genotype Posteriors:
all_jointcalls_sRecal_allPASS_PP2.vcf which has 3598528 variants.
So my question is :
How to remove those variants with FILTER_GA-20 tags properly, in a GATK way, if simple unix command did not do the job right? I checked SelectVariants but if I do exclude filter, I dont think it is right.I checked on on other exclude options but none seem right for what I need to do, hence the post!
Do I need to be worried with the low number passing GQ filter? THis is a WES data .
Is it even necessary to remove them for downstream analysis like VariantAnnotator or funcotator?
also, on another note; is it absolute requirement to have a ped file for annotation and funcotator?
Thankyou in advance.