I want to merge a PLINK file with itself, maintaining high genotyping rate. Let's say I have a file test.bed/bim/fam, what I am doing is the following:
- randomly selecting few SNPs from test.bim into a file snpstoflip.txt
flipping those SNPs using
./plink --bfile test --flip snpstoflip.txt --make-bed --out flipped_test
Duplicating the samples and SNP ids for the flipped_test dataset. (Adding something like "Dup_" infront of each sampleIDs and rsIDs)
Merge them using
./plink --bfile test --bmerge flipped_test --make-bed --out MergedFiles
But, doing this, the genotyping rate decreases a lot, to 0.5, that means we have a lot of missing entries. Is there a better way of achieving this? The idea is to replicate the PED file many times to generate a scaled up large data set from a relatively small data set. Please let me know, your views on this or how to tackle this problem, without getting a lot of missing entries.