Good practice for merging different sets of Plink data files
0
0
Entering edit mode
3.1 years ago
guessit • 0

I have two sets of imputed + QCed genetic data in Plink binary format (.bed, .bim, .fam) and would like to merge them (e.g. using plink --merge-list) for subsequent GWAS.

  1. The two data sets are from different populations.
  2. The variant counts in the data sets are very different. One set has 1.8× variants compared with the other.
  3. The same variant may have different A1/A2 values in the two .bim files.

I wonder if I may directly merge the files, or some cleaning beforehand is needed. More specifically,

  1. Should I keep only the variants present in both data sets?
  2. Do I need to fix the A1/A2 coding before merging, so that the same variant has the same A1/A2 in the two data sets? If so, how can I do this?
    • Some of my analyses involve analyzing the merged data in the additive + dominant component format (generated using plink --recodeAD). This format counts A1 alleles -- if I do not make the A1/A2 coding consistent before merging, will the formatted merged data be problematic?

Any advice will be much appreciated!

Plink GWAS • 647 views
ADD COMMENT

Login before adding your answer.

Traffic: 2550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6