plink list-duplicate-vars not working
1
0
Entering edit mode
5.2 years ago

Hi all,

I am a newby when it comes to plink and got stuck trying to work with some GWAS data that I received. The problem is that I have duplicates (same SNP-name and location) in my BIM file.

I've tried using the --list-duplicate-vars option and when I do, it doesn't give me any errors, but the dupvar file is either completely empty or it only has a header: CHR - POS - ALLELES - IDS

I want to either delete the additional variant or rename it, so I can check whether I need it later. There is no way that I can manually change the snp-names as I have about 7 million variants. Can anyone tell what I'm doing wrong or give me additional tips?

This is my code:

plink --bfile full_2 --list-duplicate-vars suppress-first --make-bed --out full_3 --memory 7000

and I'm using Plink 1.9 for windows

Thanks

Cynthia

plink • 6.0k views
ADD COMMENT
1
Entering edit mode

Hello and welcome to biostars cynthiakusters ,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode
3.8 years ago

Hi Cynthia,

I've been having the same problem with plink lately! I think if you can't get plink to work, you can remove the duplicates from the bim file yourself by writing another script in bash or python.

Here are some bash scripts I've used lately to check for dupes:

sort -u your-file.bim > duplicates-removed.bim

If you don't want it sorted for whatever reason, you can use something like this to find specific duplicate snps to remove

cat your-file.bim | cut -f1 | sort | uniq -c | sort -k2 | less -S #change f1 to index of rsID col

In the plink documentation they also have another work around for removing duplicates, here it is

grep -v '^#' reference.vcf | cut -f 3 | sort | uniq -d > reference.dups
plink --bfile reference --exclude reference.dups --make-bed --out reference

Hope any of this helps!

-BF

ADD COMMENT
1
Entering edit mode

The issue is that plink 1.9 —list-duplicate-vars is based on positions and alleles, not variant IDs. I will elaborate on this in the documentation; sorry about the confusion.

plink 2.0’s —rm-dup (https://www.cog-genomics.org/plink/2.0/filter#rm_dup ) should have the functionality you’re actually looking for.

ADD REPLY

Login before adding your answer.

Traffic: 2596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6