How do I remove duplicate SNPs in PLINK from more than 1 data set?
1
0
Entering edit mode
6 months ago
u19010843 • 0

Hi there,

I am trying to remove duplicate SNPs from my data but I I have data from 6 different panels, I am not sure how to do them all in plink at once?

SNPs Plink • 521 views
ADD COMMENT
1
Entering edit mode
6 months ago
bk11 ★ 2.4k

For each of your data, find duplicate SNPs first and use exclude function of Plink to remove them.

#Finding duplicate SNPs first in dataset1
awk '{print $2}' data1.bim |sort |uniq -d >data1_duplicate_snps.list

#Removing duplicate SNPs from dataset1
plink --bfile data1 --exclude data1_duplicate_snps.list --make-bed --out data1_duplicateSNPs_removed
ADD COMMENT
0
Entering edit mode

Thank you for taking the time to reply! Before I saw your comment I tried this below: and it worked but I am just not sure if it was the correct thing to do? ./plink --noweb --sheep --bfile MER_GSv2 --list-duplicate-vars ids-only suppress-first ./plink --noweb --sheep --bfile MER_GSv2 --exclude plink.dupvar --make-bed --out GSv2.Dup I then repeated the steps above for each of my panels Options in effect: --bfile MER_GSv2 --exclude plink.dupvar --make-bed --noweb --out GSv2.Dup --sheep Note: --noweb has no effect since no web check is implemented yet. 8192 MB RAM detected; reserving 4096 MB for main workspace. 50115 variants loaded from .bim file. 460 sheep (24 males, 436 females) loaded from .fam. --exclude: 49235 variants remaining. Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 0 founders and 460 nonfounders present. Calculating allele frequencies... done. Total genotyping rate is 0.954605. 49235 variants and 460 sheep pass filters and QC. Note: No phenotypes present. --make-bed to GSv2.Dup.bed + GSv2.Dup.bim + GSv2.Dup.fam ... done.

ADD REPLY

Login before adding your answer.

Traffic: 1719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6