I received an hg38 VCF file that's had variants imputed with 1000 genomes. I've encountered some issues with the VCF; REF alleles that do not align to a reference genome, ALT alleles that do not appear to be reported anywhere in the literature, and, most recently, variants that flat-out do not align to the human genome (variants on chr19 with bp-pos 100 million+ when the whole chromosome is in the 50 million bp range).
I've worked out hack-y solutions to most of the issues that I've encountered, but this latest one has been an issue for me. I only detected these variants when I ran VEP and it flagged them as not mapping to the genome. As such, I'm more or less removing these variants one at a time using
grep -v. I'd like a solution where I can just remove any variants from the vcf that appear to map to regions that do not exist in the human genome. Bonus points if the solution also encompasses some of the other issues I mentioned, although I think I've already found solutions to those. Is there anything out there that does this?