Question

merging GWAS data sets in Plink, error message: merge-equal-pos failure. Variants 'rs___a' and 'rs__b' have the same position, but do not share the same alleles

0

Entering edit mode

8.3 years ago

laurenleesc • 0

Hi,

I am trying to merge two GWAS data sets in Plink. Apparently there are multiple variant ids that share the same position but have different alleles. The merge command then errors out. I wish it would cycle through the remaining file and generate a whole list of these so that I could exclude them all at once. Does anyone know how to code this?

Thanks!

C:\Python27\Scripts>plink --bfile AABC_Ziv_Shanghai2 --bmerge CIDR_chr1_cleaned --make-bed --out CIDR_chr1_AABC_Ziv_shanghai --merge-equal-pos
PLINK v1.90b3.27 64-bit (13 Dec 2015)      https://www.cog-genomics.org/plink2
(C) 2005-2015 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CIDR_chr1_AABC_Ziv_shanghai.log.
Options in effect:
  --bfile AABC_Ziv_Shanghai2
  --bmerge CIDR_chr1_cleaned
  --make-bed
  --merge-equal-pos
  --out CIDR_chr1_AABC_Ziv_shanghai

7944 MB RAM detected; reserving 3972 MB for main workspace.
6320 people loaded from AABC_Ziv_Shanghai2.fam.
4001 people to be merged from CIDR_chr1_cleaned.fam.
Of these, 4001 are new, while 0 are present in the base dataset.
1449016 markers loaded from AABC_Ziv_Shanghai2.bim.
176885 markers to be merged from CIDR_chr1_cleaned.bim.
Of these, 143496 are new, while 33389 are present in the base dataset.
Warning: Variants 'rs3094315' and '1:752566' have the same position.
Warning: Variants 'rs4040617' and 'kgp5225889' have the same position.
Warning: Variants 'rs28609852' and 'kgp3324955' have the same position.
Error: --merge-equal-pos failure.  Variants 'rs17026104' and 'kgp4275897' have
the same position, but do not share the same alleles.

Plink merge GWAS error • 4.7k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.3 years ago by laurenleesc • 0

0

Entering edit mode

Hi, Could you fine the solution and the reason for this problem? The first data set of mien is the 1000G reference panel (as controls) and the second one is cases. when I want to merge them with Plink I have this problem even when I cleaned the data.

Could you tell me what is the solution? Thank you in advance.

ADD REPLY • link 7.6 years ago by fatima ▴ 20

score 0 · Answer 1 · 2019-08-30

0

Entering edit mode

4.7 years ago

h.d.green • 0

I have a very rough solution to this

awk '{print $1,$4}' data.bim > bps

will give you a file of chromosome base pairs

sort bps | uniq -c | awk '($1>1)' | awk '{print $3'} > dupbps

Will give a list of those that are duplicated and spit out the base pair number

grep -wf dupbps file.bim | awk '{print $2'} > excbp

Will output a list of snps to exclude.

It's not perfect but it's the best I can get. Been smashing my head against this lately

ADD COMMENT • link 4.7 years ago by h.d.green • 0

0

Entering edit mode

Have you tried using plink 2.0’s —set-all-var-ids flag on all datasets beforehand, instead of —merge-equal-pos?

ADD REPLY • link 4.7 years ago by chrchang523 10k