Question: Remove duplicated-by-position SNPs using PLINK failed on 1000 Genome phase 3 data
gravatar for Opal
2.5 years ago by
Opal0 wrote:


I intended to download chr22 genotype data of the 1000Genome phase 3 data (from and extract SNPs in high LD with my selected SNP list (test.txt) using PLINK. Let's say my test.txt file contains only the following line:


When I ran

plink --vcf chr22.phase3.vcf.gz --show-tags text.txt --list-all --tag-kb 500 --tag-r2 0.8

the error message said

Error: Duplicate ID 'rs10656307'.

Using suggestion from previous questions and answers posted on this forum, I tried the following to remove duplicates first:

plink --vcf chr22.phase3.vcf.gz --list-duplicate-vars ids-only suppress-first

plink --vcf chr22.phase3.vcf.gz --exclude plink.dupvar --make-bed

plink --bfile chr22.phase3 --show-tags test.txt --list-all --tag-kb 500 --tag-r2 0.8

Then I tried manually add 'rs10656307' to plink.dupvar file, now named as plink.dupvar2, and ran again:

plink --vcf chr22.phase3.vcf.gz --exclude plink.dupvar2 --make-bed

plink --bfile chr22.phase3v2 --show-tags test.txt --list-all --tag-kb 500 --tag-r2 0.8

The error message prompt another SNP and said

Error: Duplicate ID 'rs111334030'.

I wonder if it is the problem of 1000Genome phase 3 data, or that I'm not doing it correctly.


ADD COMMENTlink modified 2.5 years ago by Kevin Blighe69k • written 2.5 years ago by Opal0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1293 users visited in the last hour