Question: Merging two Plink datasets, but some sites have reversed genotypes
5.1 years ago
Stony Brook
devenvyas640 wrote:

I have two datasets from the same type of array in Plink map/ped format. The smaller of the two has about >586,662 SNPs and 93 samples, and the larger of the two has about 620,000 and 934 samples.

I want to merge the two datasets such that I have an intersection of the two (i.e., all 1027 samples but only SNPs present in both data sets).

From an experience a few days ago, I know that for about the larger dataset has about 7000 sites where the alleles are reverse of the large set (e.g., larger set is a C/A polymorphism and the smaller set is a G/T polymorphism). Happily, this array was designed to exclude symmetrical SNPs (A/T or C/G), so fixing this problem is a little less confusing; however, I do not have this list of flipped SNPs.

I know I can flip genotypes in Plink using

plink --file data --flip list.txt --flip-subset mylist.txt --recode

I was wondering, how can I identify these sites and get these sites merged showing the same strand? Thanks


snp plink
written 5.1 years ago by devenvyas640
5.1 years ago
United States
chrchang523 wrote:

Convert both filesets to binary, and then use --bmerge to try to merge them.  The .missnp file should then list all the loci that need to be flipped.

written 5.1 years ago by chrchang523

To be clear, I would take that missnp file and flip one of the datasets and then re-merge, and then I would be good? (Also, the samples in each dataset are completely different)

Also, it appears that merge mode ( produces a union instead of an intersection, but I don't want the SNPs that are only present in one of two. I only want SNPs present in both.

written 5.1 years ago by devenvyas640

Yes, that's correct, you flip just one dataset and remerge.

You will only get merge conflicts for SNPs present in both datasets.

written 5.1 years ago by chrchang523
