Question: PLINK error locus has more than 2 alleles
1
gravatar for biogirl
4.2 years ago by
biogirl160
European Union
biogirl160 wrote:

Hi all,

 

I've come across a problem in PLINK when trying to do a Fishers exact test.  The command I'm using is as follows:

plink --file test --fisher --allow-no-sex --1

And the error I get is:

ERROR: Locus 1:54208 has >2 alleles

               Individual Ind3 Ind3 has genotype [ G G ] but we've already seen [ A ] and [ T ]

I've checked my file rigorously and the data is indeed 'GG' with no A's or T's nearby!  I also have no missing data.  The length of each line (i.e. for each individual) is consistent throughout.  I've tried both tab- and space-demilited files, but no difference.  I haven't found any special characters etc. either (using vi :set list).

Interestingly, I've taken Ind3 out of the file and re-run the test, but the same error is thrown up (but now obviously on Ind4, which is now on line 3).  

Any ideas?

plink snps gwas • 4.0k views
ADD COMMENTlink modified 4.2 years ago by Brice Sarver2.6k • written 4.2 years ago by biogirl160
2
gravatar for Brice Sarver
4.2 years ago by
Brice Sarver2.6k
United States
Brice Sarver2.6k wrote:

Plink requires that sites be ballelic. If ANY other individual has a nucleotide/nucleotides that make it multiallelic at that site, then plink fails.

Barring this, your file is formatted incorrectly. From the plink manual:

Genotypes (column 7 onwards) should also be white-space delimited; they can be any character (e.g. 1,2,3,4 or A,C,G,T or anything else) except 0 which is, by default, the missing genotype character. All markers should be biallelic. All SNPs (whether haploid or not) must have two alleles specified. Either Both alleles should be missing (i.e. 0) or neither. No header row should be given. For example, here are two individuals typed for 3 SNPs (one row = one person):

     FAM001  1  0 0  1  2  A A  G G  A C 
     FAM001  2  0 0  1  2  A A  A G  0 0 
     ...

The default missing genotype character can be changed with the --missing-genotype option, for example:

plink --file mydata --missing-genotype N

ADD COMMENTlink written 4.2 years ago by Brice Sarver2.6k

Hi, sorry, perhaps I wasn't clear in my original message.  My data is biallelic, for example:

Ind1 Ind1 0 0 0 1 A A G G A A T T

Ind2 Ind2 0 0 0 2 T T C C C C T T

I have followed the plink manual to the letter with regards the delimits in the file.  The file encoding is correct, given that I can reduce the line length down to a bare minimum and execute plink ok.  Therefore, I think the file format is ok.  Or do you mean my syntax is incorrect in the file?

ADD REPLYlink written 4.2 years ago by biogirl160

I've just re-read your message and it's all come together.  So what you're saying is that Ind1 might have AA at that particular locus, whilst Ind2 might have TT.  So if Ind3 has CC, then it's going to fail.  Thanks, I think I can work around this now.

ADD REPLYlink written 4.2 years ago by biogirl160

Yep, you've got it. Glad to help.

ADD REPLYlink written 4.2 years ago by Brice Sarver2.6k

How did you work around this? I think plink should be able to figure this out. Thanks.

ADD REPLYlink written 12 months ago by vamocksu0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1060 users visited in the last hour