PLINK error locus has more than 2 alleles
1
1
Entering edit mode
6.2 years ago
biogirl ▴ 190

Hi all,

I've come across a problem in PLINK when trying to do a Fishers exact test.  The command I'm using is as follows:

plink --file test --fisher --allow-no-sex --1

And the error I get is:

ERROR: Locus 1:54208 has >2 alleles

Individual Ind3 Ind3 has genotype [ G G ] but we've already seen [ A ] and [ T ]

I've checked my file rigorously and the data is indeed 'GG' with no A's or T's nearby!  I also have no missing data.  The length of each line (i.e. for each individual) is consistent throughout.  I've tried both tab- and space-demilited files, but no difference.  I haven't found any special characters etc. either (using vi :set list).

Interestingly, I've taken Ind3 out of the file and re-run the test, but the same error is thrown up (but now obviously on Ind4, which is now on line 3).

Any ideas?

plink gwas snps • 5.7k views
0
Entering edit mode

Hi, How do you solve this problem?

0
Entering edit mode

Please use ADD REPLY, not the answer box.

0
Entering edit mode

I already moved your comment, no need to double-post. The idea is simply to reserve the answer box for answers in order to keep the thread logically organized, no worries ;-)

2
Entering edit mode
6.2 years ago
Brice Sarver ★ 3.6k

Plink requires that sites be ballelic. If ANY other individual has a nucleotide/nucleotides that make it multiallelic at that site, then plink fails.

Barring this, your file is formatted incorrectly. From the plink manual:

Genotypes (column 7 onwards) should also be white-space delimited; they can be any character (e.g. 1,2,3,4 or A,C,G,T or anything else) except 0 which is, by default, the missing genotype character. All markers should be biallelic. All SNPs (whether haploid or not) must have two alleles specified. Either Both alleles should be missing (i.e. 0) or neither. No header row should be given. For example, here are two individuals typed for 3 SNPs (one row = one person):

     FAM001  1  0 0  1  2  A A  G G  A C
FAM001  2  0 0  1  2  A A  A G  0 0
...


The default missing genotype character can be changed with the --missing-genotype option, for example:

plink --file mydata --missing-genotype N

0
Entering edit mode

Hi, sorry, perhaps I wasn't clear in my original message.  My data is biallelic, for example:

Ind1 Ind1 0 0 0 1 A A G G A A T T

Ind2 Ind2 0 0 0 2 T T C C C C T T

I have followed the plink manual to the letter with regards the delimits in the file.  The file encoding is correct, given that I can reduce the line length down to a bare minimum and execute plink ok.  Therefore, I think the file format is ok.  Or do you mean my syntax is incorrect in the file?

0
Entering edit mode

I've just re-read your message and it's all come together.  So what you're saying is that Ind1 might have AA at that particular locus, whilst Ind2 might have TT.  So if Ind3 has CC, then it's going to fail.  Thanks, I think I can work around this now.

0
Entering edit mode

Yep, you've got it. Glad to help.

0
Entering edit mode

How did you work around this? I think plink should be able to figure this out. Thanks.