I am after some advise as to what is the best method to correct for differences in allele codes at any given snp when merging across multiple files. I have data in plink format (bed/bim/fam) for several populations. When I attempt to merge the data using plink as follows:
plink --file fA --merge-list allfiles.txt --make-bed --out mynewdata
I get reports of +/- strand issues and a file is generated detailing the problem SNPs.
On considering the .bim files at these problem snps for each population example allele codes are as follows:
Pop1: rs1000000 A G Pop2: rs1000000 T C Pop3: rs1000000 A G
This indicates to me that Pop2 has undergone strand flip.
Is there any software that can account for these differences when merging snp data - this must be a common problem? Or do each of these flips need to be identified computationally and corrected using plink to update the allele information as follows:
plink --bfile mydata --update=alleles mylist.txt --make-bed --out newfile
Thanks in advance.