Question

Merging two Plink datasets, but some sites have reversed genotypes

0

Entering edit mode

8.8 years ago

devenvyas ▴ 740

I have two datasets from the same type of array in Plink map/ped format. The smaller of the two has about >586,662 SNPs and 93 samples, and the larger of the two has about 620,000 and 934 samples.

I want to merge the two datasets such that I have an intersection of the two (i.e., all 1027 samples but only SNPs present in both data sets).

From an experience a few days ago, I know that for about the larger dataset has about 7000 sites where the alleles are reverse of the large set (e.g., larger set is a C/A polymorphism and the smaller set is a G/T polymorphism). Happily, this array was designed to exclude symmetrical SNPs (A/T or C/G), so fixing this problem is a little less confusing; however, I do not have this list of flipped SNPs.

I know I can flip genotypes in Plink using

plink --file data --flip list.txt --flip-subset mylist.txt --recode

I was wondering, how can I identify these sites and get these sites merged showing the same strand? Thanks

plink SNP • 4.6k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.8 years ago by devenvyas ▴ 740

Ram · Accepted Answer · 2015-06-22

3

Entering edit mode

8.8 years ago

chrchang523 10k

Convert both filesets to binary, and then use --bmerge to try to merge them. The .missnp file should then list all the loci that need to be flipped.

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.8 years ago by chrchang523 10k

0

Entering edit mode

To be clear, I would take that missnp file and flip one of the datasets and then re-merge, and then I would be good? (Also, the samples in each dataset are completely different)

Also, it appears that merge mode (http://pngu.mgh.harvard.edu/~purcell/plink/dataman.shtml#merge) produces a union instead of an intersection, but I don't want the SNPs that are only present in one of two. I only want SNPs present in both.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by devenvyas ▴ 740

0

Entering edit mode

Yes, that's correct, you flip just one dataset and remerge.

You will only get merge conflicts for SNPs present in both datasets.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by chrchang523 10k