I am attempting to build linkage map using a list of variants from a sexual population we have whole genome sequenced. Marker density is the least of our problems, so I would like to reduce the ~400,000 SNPs I have high confidence in to ~2,000. However, I would like them to be regularly spaced.
In my head, this means: Select SNP, move 100kb down the contig, select the next SNP after the 100kb interval, move another 100kb, select another SNP, repeat. We aren't quite sure of the recombination rate, and some of my contigs are shorter than 100kb (and I'd like at least 1, maybe 2, markers from them).
I know I could tell GATK to give me a random subset of my marker population, but I'd like to do something more methodical. The window based clustering criteria in VariantFiltration also don't seem very useful, since it is a sliding window and not a binned window. Does anyone know of a function of a common toolset which can perform these functions? I'm not proficient in perl or python, so I'm unsure how to write my own script.
I have a .intervals file and a multisample VCF file, as well as a .txt table of everything in the VCF file and I intend to load the data into JoinMap to perform the actual mapping.