I’m working with a list of SNP variants (1200 SNP) related to a complex disease, which distributed on the various chromosomes. I used vcftools to compute Fst between my population and 1000 genome populations. Now, to investigate the meaningful results from Fst analysis, I would like to extract random SNP from 1000 genome population and my population that their allele frequencies are similar with the allele frequency of my SNP list. I found the simple of command of
shuf –n 1200 file.vcf
1) But I don’t know how to consider the matched (similar) allele frequency with my SNPs, could you please share me your suggestion?
2) Considering the number of my SNP, 1200 across 22 chromosomes, and separate vcf file for each chromosome of 1000 genome, how many SNPs should be randomly extracted from each vcf file?
3) Here, my focus is on Fst analysis, could you please kindly tell me if just considering of matched allele frequency with my SNPs is sufficient for selecting the random SNP or other things should be also considered?
Thanks in advance