Entering edit mode
3.1 years ago
guessit
•
0
I have two sets of imputed + QCed genetic data in Plink binary format (.bed
, .bim
, .fam
) and would like to merge them (e.g. using plink --merge-list
) for subsequent GWAS.
- The two data sets are from different populations.
- The variant counts in the data sets are very different. One set has 1.8× variants compared with the other.
- The same variant may have different A1/A2 values in the two
.bim
files.
I wonder if I may directly merge the files, or some cleaning beforehand is needed. More specifically,
- Should I keep only the variants present in both data sets?
- Do I need to fix the A1/A2 coding before merging, so that the same variant has the same A1/A2 in the two data sets? If so, how can I do this?
- Some of my analyses involve analyzing the merged data in the additive + dominant component format (generated using
plink --recodeAD
). This format counts A1 alleles -- if I do not make the A1/A2 coding consistent before merging, will the formatted merged data be problematic?
- Some of my analyses involve analyzing the merged data in the additive + dominant component format (generated using
Any advice will be much appreciated!