Question: How to bypass SNPs with identical A1 and A2 alleles in PLINK?
0
gravatar for yorgos.athanasiadis
5.8 years ago by
Denmark
yorgos.athanasiadis40 wrote:

Hello,

I am trying to merge two imputed (I used SHAPEIT and IMPUTE2) binary file sets with PLINK's bmerge command, but this error pops up:

Error: Identical A1 and A2 alleles on line 1

I am pretty sure I've got many single-allele SNPs in my data, so I was wondering if there is a quick way to solve this problem? I checked PLINK's manual, but there seems to be no way to ignore such SNPs or correct them.

I would like to avoid - if possible - to look for a solution in SHAPEIT or IMPUTE2, because prephasing and imputation already took a very long time to run.

Any ideas?

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by yorgos.athanasiadis40

This might be due to an incompatibility between PLINK's Oxford import and the latest IMPUTE2 output format.  Can you send me a small .gen/.sample fileset that generates this problem?  (You can probably omit all lines in the .gen file past the first 3-4.)

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by chrchang5237.3k
0
gravatar for yorgos.athanasiadis
5.8 years ago by
Denmark
yorgos.athanasiadis40 wrote:

Sure:

Gen (impute2) file (these are actually the first lines of the file - SNPs in the converted plink file have a different order):

--- 1:55565:G:A 55565 G A 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1

--- 1:55582:T:C 55582 T C 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1

--- 1:55588:T:C 55588 T C 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1

sample file:

ID_1 ID_2 missing sex 1
0 0 0 D 1
_0005432396f6cb97_FAM _0005432396f6cb97 0 2 1
_0022055d7384cd08_FAM _0022055d7384cd08 0 2 1
_0121e9eb137f1c02_FAM _0121e9eb137f1c02 0 1 1
_0187a92841fa96a2_FAM _0187a92841fa96a2 0 2 1
_018e122ad9751cf7_FAM _018e122ad9751cf7 0 2 1

I hope this helps

Yorgos

ADD COMMENTlink written 5.8 years ago by yorgos.athanasiadis40

Did any of these SNPs end up with equal A1/A2 alleles?  (I just tried importing and merging them, and did not have any problems.)

* If only other SNPs were affected, could you find the corresponding .gen line for one of them?

* If you actually did have problems with these specific SNPs, can you list the commands you used to import and merge your files?

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by chrchang5237.3k

Hi again,

No, I didn't have any problem with these specific SNPs, they just were the first ones in the impute2 file and I tried to do exactly what you asked me to (i.e. paste the 3-4 first lines). I guess that SNP order was changed when I converted from gen to plink format.

I have spotted the line in the impute2 file where the monomorphic SNP is:

1 rs12564807 734462 A A 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0

* All the lines I've pasted here are incomplete - they are actually quite longer, involving many more individuals

* I have used regex with grep + awk commands to create a list of such monomorphic SNPs and then --exclude them with PLINK. This actually resolves the issue, I was just wondering if PLINK has some built-in flag to ignore this problem, which by the way pops up only when I try to use the --merge commands.

ADD REPLYlink written 5.8 years ago by yorgos.athanasiadis40

There currently isn't a built-in command, since this generally indicates a data processing error that should be fixed at the source.  But do you know what caused impute2 to generate a file with identical A1 and A2 allele codes here?  If it's a routine occurrence, and it only happens with monomorphic SNPs, I will modify the .gen (and .bgen, if necessary) import routines to automatically zero out one of the allele codes here.

ADD REPLYlink written 5.8 years ago by chrchang5237.3k

Hi,

After looking back at all the files, I found the following:

1. the original bim file I used in PLINK to prephase with SHAPEIT had this line for this specific SNP:

1    rs12564807    0    734462    0    A

2. Using that file, SHAPEIT returned this haplotype for the SNP (Showing only part of it here):

1 rs12564807 734462 A A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

I guess that that '0' (instead of 'G') is responsible for the issue.

Y.

ADD REPLYlink written 5.8 years ago by yorgos.athanasiadis40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1228 users visited in the last hour