Duplicate ID in bed file
2
1
Entering edit mode
4.1 years ago
janhuang.cn ▴ 170

I am using PLINK v1.90b3s 64-bit (17 Jun 2015) to generate a LD matrix from 1000G VCF file for a long list of SNPs.

I use this command to convert VCF to bed file

plink --vcf ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz --make-bed --out binary_fileset

I then use this command to generate LD statistic reports

plink --r2 --bfile binary_fileset --ld-snp-list snp_chr22_sample.txt --ld-window-r2 0.8

But it returns the below error

Error: Duplicate ID 'rs10656307'.

The SNP file does not contain this SNP. So I think it is the bed file contain duplicated record of rs10656307. Is there a way to remove duplicated SNP in the bed file?

bed duplicate id • 6.7k views
ADD COMMENT
2
Entering edit mode
3.2 years ago

I had the same problem. Solved in two steps:

1) Got all the duplicated ids from the bim file: cut -f 2 ALL.chr1.bim | sort | uniq -d > 1.dups

2) Excluded these ids from the bfile:

plink --bfile ALL.chr1  --exclude 1.dups --make-bed --out ALL.filt.chr1;

With these new filtered files there were no errors while generating LD reports

ADD COMMENT
0
Entering edit mode

Thanks, this worked for me

ADD REPLY
0
Entering edit mode
4.1 years ago
Floris Brenk ★ 1.0k

Using --list-duplicate-vars you can identify the duplicates in the data plink website identifying duplicates

And using --exlcude you can remove your snps plink website removing snps

ADD COMMENT
0
Entering edit mode

I used the --list-duplicate-vars to generate a list of duplicated SNPs, but it does not contain the one reported as dulicate in the analysis.

ADD REPLY
0
Entering edit mode

Right, I think --list-duplicate-vars only find SNPs duplicated by position.

ADD REPLY

Login before adding your answer.

Traffic: 1675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6