VCF files have traditionally contained variants marked with the filter status "PASS." However, if we now encounter a VCF file that includes not only "PASS" but also "REFCALL" (or similar labels indicating reference calls), how should we approach their inclusion in the analysis?
Specifically:
What are the implications of having both "PASS" and "REFCALL" entries in the VCF file?
How should the "REFCALL" records be utilized during variant analysis?
How does the standard analysis pipeline need to be adjusted to properly handle reference calls alongside variant calls?
During the annotation step, is any special processing required for "REFCALL" entries? I only have this one VCF file and have not encountered "REFCALL" before, so I am unsure how to use it in the analysis. Your guidance would be appreciated.
PASS if this position has passed all filters, i.e., a call is made at this position. Otherwise,
if the site has not passed all filters, a semicolon-separated list of codes for filters that fail.
If filters have not been applied,
then this field should be set to the missing value. (String, no whitespace or semicolons permitted)
any other FILTER should be explained in the VCF header.
For any other meaning, you should ask the person who generated the vcf.
What variant caller was used, was it deepvariant? Is your VCF a single sample VCF file? If I am not wrong (and if you have a single sample VCF file), REFCAL would mean that the position has a reference call (the genotype in that position should be 0/0). In other words, there is no variation in that position. This single sample VCF file might be used downstream for a multi-sample variant/genotype calling (for example using GLnexus). If you are solely working with a single sample VCF file, feel free to remove or filter out those positions.