Dear community members,
I have an Illumia array and after transformation to VCF it looks like (one line as an example)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NAME 1 752721 rs3131972 C T . . PR GT 0/1
Now I need to extract information about these variants from a large cohort of WGS samples.
The problem is - C is not actually REF allele for this variant ( https://www.ncbi.nlm.nih.gov/snp/rs3131972?horizontal_tab=true ). For some variants REF is actually REF, but for half they are switched.
When I look this variant in array specs, I see a line
so the variant here is even A/G.
Is there a way to normalize a VCF to reference, to fix REF/ALT? I am absolutely lost since I supposed it to be a very simple procedure but it seems very complex. I can't rely even on rs-IDs - they are missing for many array variants.