Question: PLINK error locus has more than 2 alleles
gravatar for biogirl
5.0 years ago by
European Union
biogirl190 wrote:

Hi all,


I've come across a problem in PLINK when trying to do a Fishers exact test.  The command I'm using is as follows:

plink --file test --fisher --allow-no-sex --1

And the error I get is:

ERROR: Locus 1:54208 has >2 alleles

               Individual Ind3 Ind3 has genotype [ G G ] but we've already seen [ A ] and [ T ]

I've checked my file rigorously and the data is indeed 'GG' with no A's or T's nearby!  I also have no missing data.  The length of each line (i.e. for each individual) is consistent throughout.  I've tried both tab- and space-demilited files, but no difference.  I haven't found any special characters etc. either (using vi :set list).

Interestingly, I've taken Ind3 out of the file and re-run the test, but the same error is thrown up (but now obviously on Ind4, which is now on line 3).  

Any ideas?

plink snps gwas • 4.6k views
ADD COMMENTlink modified 5 months ago by Runen10 • written 5.0 years ago by biogirl190

Hi, How do you solve this problem?

ADD REPLYlink written 5 months ago by Runen10

Please use ADD REPLY, not the answer box.

ADD REPLYlink written 5 months ago by ATpoint28k

I already moved your comment, no need to double-post. The idea is simply to reserve the answer box for answers in order to keep the thread logically organized, no worries ;-)

ADD REPLYlink written 5 months ago by ATpoint28k
gravatar for Brice Sarver
5.0 years ago by
Brice Sarver3.3k
United States
Brice Sarver3.3k wrote:

Plink requires that sites be ballelic. If ANY other individual has a nucleotide/nucleotides that make it multiallelic at that site, then plink fails.

Barring this, your file is formatted incorrectly. From the plink manual:

Genotypes (column 7 onwards) should also be white-space delimited; they can be any character (e.g. 1,2,3,4 or A,C,G,T or anything else) except 0 which is, by default, the missing genotype character. All markers should be biallelic. All SNPs (whether haploid or not) must have two alleles specified. Either Both alleles should be missing (i.e. 0) or neither. No header row should be given. For example, here are two individuals typed for 3 SNPs (one row = one person):

     FAM001  1  0 0  1  2  A A  G G  A C 
     FAM001  2  0 0  1  2  A A  A G  0 0 

The default missing genotype character can be changed with the --missing-genotype option, for example:

plink --file mydata --missing-genotype N

ADD COMMENTlink written 5.0 years ago by Brice Sarver3.3k

Hi, sorry, perhaps I wasn't clear in my original message.  My data is biallelic, for example:

Ind1 Ind1 0 0 0 1 A A G G A A T T

Ind2 Ind2 0 0 0 2 T T C C C C T T

I have followed the plink manual to the letter with regards the delimits in the file.  The file encoding is correct, given that I can reduce the line length down to a bare minimum and execute plink ok.  Therefore, I think the file format is ok.  Or do you mean my syntax is incorrect in the file?

ADD REPLYlink written 5.0 years ago by biogirl190

I've just re-read your message and it's all come together.  So what you're saying is that Ind1 might have AA at that particular locus, whilst Ind2 might have TT.  So if Ind3 has CC, then it's going to fail.  Thanks, I think I can work around this now.

ADD REPLYlink written 5.0 years ago by biogirl190

Yep, you've got it. Glad to help.

ADD REPLYlink written 5.0 years ago by Brice Sarver3.3k

How did you work around this? I think plink should be able to figure this out. Thanks.

ADD REPLYlink written 21 months ago by vamocksu0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 955 users visited in the last hour