I've been working with microarray data which, of course, explicitly reports the genotype at each SNP on the microarray chip
Now I am trying to incorporate WGS datasets into my project, and I would like to extract the genotypes at positions that match those in my microarray datasets. My WGS data is in the form of VCF files, which only explicitly report positions for which the sample data DIFFERS from the reference sequence (e.g. hg37).
My question is more about the positions that are NOT represented in the VCF file. Can I presume these are homologous to the reference? How do I know that these SNPs are not present in the VCF simply because they reside in low quality or unsequenced regions of the sample OR the reference? How does one reliably extract genotypes at SNP positions that are homologous to the reference?
Thanks for any advice.