Hi all,
Assuming, there are multiple samples, say 500 samples, which used for variant calling by HaplotypeCaller (GATK) and joint genotyping to produce the final vcf file. Now, hard-filtering can be applied; following hard-filtering of variants, the "Filter column" is added to the vcf file, implying which variant PASS or FILTER, which PASS variants should be used for downstream analysis as I read. I think, the FILTER means that variant(s) filtered in all 500 samples, is it right? however, I cannot understand how it can happen, how is possible the variant X had the low quality in all samples? Could you please clear me on this issue?
Many thanks in advance
Does hard filtering not mean that non-PASS entries will be removed? Is VCF filter not a soft filter which marks but doesn't remove?
I think your filter defines what it means. The way I understand FILTER is: you can have a filter F = N% of samples have attribute X that is of the sort Y, and samples that fail that will be marked with that filter flag.
Hi Ram,
Thanks for the comment, so it is logical to use variants marked with FILTER flag for further analysis, as you mentioned, yes? so, could you please kindly tell me why this filtering is applied, what is its benefit for the work? I'm looking for a guide for filtering the variants derived from whole-genome sequencing of a given population. Any suggestion would be highly appreciated.
I think finswimmer's answer below covers your questions. Please check out the links and let us know if you have any further questions.