Question: Remove duplicated-by-position SNPs using PLINK failed on 1000 Genome phase 3 data
0
gravatar for Opal
15 months ago by
Opal0
Opal0 wrote:

Hello,

I intended to download chr22 genotype data of the 1000Genome phase 3 data (from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) and extract SNPs in high LD with my selected SNP list (test.txt) using PLINK. Let's say my test.txt file contains only the following line:

rs3761445

When I ran

plink --vcf chr22.phase3.vcf.gz --show-tags text.txt --list-all --tag-kb 500 --tag-r2 0.8

the error message said

Error: Duplicate ID 'rs10656307'.

Using suggestion from previous questions and answers posted on this forum, I tried the following to remove duplicates first:

plink --vcf chr22.phase3.vcf.gz --list-duplicate-vars ids-only suppress-first

plink --vcf chr22.phase3.vcf.gz --exclude plink.dupvar --make-bed

plink --bfile chr22.phase3 --show-tags test.txt --list-all --tag-kb 500 --tag-r2 0.8

Then I tried manually add 'rs10656307' to plink.dupvar file, now named as plink.dupvar2, and ran again:

plink --vcf chr22.phase3.vcf.gz --exclude plink.dupvar2 --make-bed

plink --bfile chr22.phase3v2 --show-tags test.txt --list-all --tag-kb 500 --tag-r2 0.8

The error message prompt another SNP and said

Error: Duplicate ID 'rs111334030'.

I wonder if it is the problem of 1000Genome phase 3 data, or that I'm not doing it correctly.

Opal

ADD COMMENTlink modified 15 months ago by Kevin Blighe51k • written 15 months ago by Opal0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1776 users visited in the last hour