I would like/need to pick 1 SNP per scaffold randomly from a .vcf file and generate a new .vcf file with those SNPs. The input file I have has SNPs from scaffolds of different length and with different number of SNPs (i.e. there are scaffolds with 5 SNPs and scaffolds with 200 SNPs). What I need is similar to --thin-count (PLINK) which removes variants at random until only n remains, but I want to include the fact that I want just 1 SNP per scaffold (well, in this case, remove all SNPs of each scaffold leaving just one).
Second step would be doing this re-sampling several times. Specifically I am looking for a code that produces X number of .vcf files, and each .vcf file has a randomly selection of SNPs, 1 per scaffold.
Would this be possible? (Nothing is impossible, right!? ;) ) or suggestions?
Thanks in advance,
PS: Let me know if you need more specifications.