Question: Merging two Plink datasets, but some sites have reversed genotypes
gravatar for devenvyas
5.1 years ago by
Stony Brook
devenvyas640 wrote:

I have two datasets from the same type of array in Plink map/ped format. The smaller of the two has about >586,662 SNPs and 93 samples, and the larger of the two has about 620,000 and 934 samples.

I want to merge the two datasets such that I have an intersection of the two (i.e., all 1027 samples but only SNPs present in both data sets).

From an experience a few days ago, I know that for about the larger dataset has about 7000 sites where the alleles are reverse of the large set (e.g., larger set is a C/A polymorphism and the smaller set is a G/T polymorphism). Happily, this array was designed to exclude symmetrical SNPs (A/T or C/G), so fixing this problem is a little less confusing; however, I do not have this list of flipped SNPs.

I know I can flip genotypes in Plink using

plink --file data --flip list.txt --flip-subset mylist.txt --recode

I was wondering, how can I identify these sites and get these sites merged showing the same strand? Thanks


snp plink • 2.8k views
ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by devenvyas640
gravatar for chrchang523
5.1 years ago by
United States
chrchang5237.1k wrote:

Convert both filesets to binary, and then use --bmerge to try to merge them.  The .missnp file should then list all the loci that need to be flipped.

ADD COMMENTlink written 5.1 years ago by chrchang5237.1k

To be clear, I would take that missnp file and flip one of the datasets and then re-merge, and then I would be good? (Also, the samples in each dataset are completely different)

Also, it appears that merge mode ( produces a union instead of an intersection, but I don't want the SNPs that are only present in one of two. I only want SNPs present in both.

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by devenvyas640

Yes, that's correct, you flip just one dataset and remerge.

You will only get merge conflicts for SNPs present in both datasets.

ADD REPLYlink written 5.1 years ago by chrchang5237.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1082 users visited in the last hour