I have a plink ped file. I wanted to remove duplicate individuals in ped file using PLINK.
Is there an option in PLINK to do this? If not, is there any other tool/option to do this?
Thank You
I have a plink ped file. I wanted to remove duplicate individuals in ped file using PLINK.
Is there an option in PLINK to do this? If not, is there any other tool/option to do this?
Thank You
From plink manual:
The IDs are alphanumeric: the combination of family and individual ID should uniquely identify a person
How are you defining your duplicates?
I suggest you run IBS/IBD to identify duplicates, then remove duplicates on missingness (i.e.: keep individuals with most SNPs).
If you know the IDs to keep/remove then look into --keep or --remove option.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi zx8754, I am trying to remove ID patients from my data and I am using the original PED file for doing that. I create a .txt file with the number of ID family and ID patients that I want to remove put in two columns, but it still doesn't work. The analysis seems to go until the end of the process (creating temporary files) when appears the message saying: Error: duplicates ID.
My command is: $ ./plink --file name --remove IDlist.txt --out subset2 --make-bed
And my IDlist.txt is:
1 2204
2 1146
So I know I have few duplicates but I don't understand why the presence of duplicates does not allow the removing process.
Discussion follows here: Problem to remove subset of patients with plink