you sound a little bit confused about this topic. Let's try to solve this :)
Filtering after variant calling is done to remove false-positive variants. If you follow the GATK pipeline there are two ways:
- VQSR: This is only applicable in larger sequencing project
- manual filtering based on specified criteria
Manual filtering can be divided in hard-filtering (meaning variants will be removed) and soft-filtering (meaning variants will be kept and flagged)
following hard-filtering of variants, the "Filter column" is added to the vcf file
Strictly spoken this is not correct. The Filter column is already there, as it is a mandatory column (see specs). Depending on the variant caller the values in this column is set to
PASS. You can now define criteria of which you believe, that a variant in your
vcf file is a false positive. Using soft-filtering you can add a name to the filter column, to see why you believe that this variant isn't true. Normally you would than go on in your downstream analysis only with those you have the flag
PASS (meaning doesn't match any soft-filtering criteria).
Which criteria you should use to find false-positive? That's the holy grail in finding variants. Nevertheless GATK has a few recommendations on where to start. But don't believe this is a gold-standard. You still will have false-positives in your list and maybe filtered out some true-positives.
If you already have something other than
. in your
vcf look into the header. There should be description of what each filter name means.