I have a bit of an odd question. I have a VCF file with only variable sites (it was pre-filtered at an MAF of 0.01). I don't have any way of getting access to the raw SNP calls, or access to alignments to recall SNPs. However, I need invariant sites for some population genetics analyses. My question is this:
The population was genotyped using a target-capture approach using several thousand 120bp probes. Thus, any locus on these probes that wasn't called as a SNP should be invariant. Is it in any way possible to create a VCF or any other kind of variant file that can include all the other loci in the probe file (multi-FASTA) as invariant sites, in addition to all the actual polymorphic sites?
I will add that the genome is quite fragmented (~67K scaffolds); the probes give the scaffold and region on the scaffold in which they are in the genome. The VCF file is set up to have the scaffold number as chromosome and the position of the SNP on the scaffold. So, somehow, I would need to get each location for the range of each probe in as a SNP, along with the nucleotide at that locus as REF, with ALT showing as just '.' for those sites.
Any suggestions to get me started on this rather convoluted problem would be much appreciated! Thanks!