Question

Good practice for merging different sets of Plink data files

0

Entering edit mode

4.4 years ago

guessit • 0

I have two sets of imputed + QCed genetic data in Plink binary format (.bed, .bim, .fam) and would like to merge them (e.g. using plink --merge-list) for subsequent GWAS.

The two data sets are from different populations.
The variant counts in the data sets are very different. One set has 1.8× variants compared with the other.
The same variant may have different A1/A2 values in the two .bim files.

I wonder if I may directly merge the files, or some cleaning beforehand is needed. More specifically,

Should I keep only the variants present in both data sets?
Do I need to fix the A1/A2 coding before merging, so that the same variant has the same A1/A2 in the two data sets? If so, how can I do this?
- Some of my analyses involve analyzing the merged data in the additive + dominant component format (generated using plink --recodeAD). This format counts A1 alleles -- if I do not make the A1/A2 coding consistent before merging, will the formatted merged data be problematic?

Any advice will be much appreciated!

Plink GWAS • 899 views

ADD COMMENT • link 4.4 years ago by guessit • 0