I have two datasets from the same type of array in Plink map/ped format. The smaller of the two has about >586,662 SNPs and 93 samples, and the larger of the two has about 620,000 and 934 samples.
I want to merge the two datasets such that I have an intersection of the two (i.e., all 1027 samples but only SNPs present in both data sets).
From an experience a few days ago, I know that for about the larger dataset has about 7000 sites where the alleles are reverse of the large set (e.g., larger set is a C/A polymorphism and the smaller set is a G/T polymorphism). Happily, this array was designed to exclude symmetrical SNPs (A/T or C/G), so fixing this problem is a little less confusing; however, I do not have this list of flipped SNPs.
I know I can flip genotypes in Plink using
plink --file data --flip list.txt --flip-subset mylist.txt --recode
I was wondering, how can I identify these sites and get these sites merged showing the same strand? Thanks