Question: run plink with locus has more than 2 alleles
0
gravatar for mary
13 months ago by
mary210
Bologna university
mary210 wrote:

dear All

I make ped and map file on shell, when I read file by plink I get below problem;

./plink --file test --noweb --missing-genotype N

54241 (of 54241) markers to be included from [ test.map ]

ERROR: Locus 1 has >2 alleles: individual R921C12 273487 has genotype [ T C ] but we've already seen [ - ] and [ T ]

I cheak my file, I seems ok, the data is indeed 'CC' with no -'s or T's nearby! the length of each line (i.e. for each individual) is consistent throughout. I've tried both tab- and space-demilited files, but no difference. I dont undrestand why I get this error. this is the raw which I get that problem:

R921C12 273487 2950577 2950350 1 Resistant C C T T T T C C T T . . .

any idea?

roh plink snps • 644 views
ADD COMMENTlink modified 13 months ago • written 13 months ago by mary210

Is this the first sample in your file or are there others? PLINK may have identified the '-' allele in another sample prior to this one. Also, are you sure that missing genotypes are encoded as 'N" for your data? PLINK normally expects '-9'.

ADD REPLYlink written 13 months ago by Kevin Blighe46k

Hi, this is 2th sample in ped file. let me I write my command for make a.ped:

1- I have this file:

SNP_Name,Chr,Coordinate,R923A04,R921B12,R921C12,R921D12,R921E12

CL635944_160.1,0,0,--,CC,CC,CC,TC,TC

CR_594.1,0,0,--,TT,TT,TT,TT,TT

CR_816.1,0,0,--,CC,TT,TT,TT,TT

2- I use these two command

sed 's/,/ /g' a.csv > a2

awk '{$1=$2=$3=""; print $0}' a2 > a3

cat a3 (first of lin is empty)

R921B12 R921C12 R921D12 R921E12 R921H11 

CC CC CC TC TC

TT TT TT TT TT

3- python -c "import sys; print('\n'.join(' '.join(c) for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))" < a3 > a4

cat a4

R921B12 CC TT CC TC TT . . .

R921C12 CC TT TT CC TT . . .

4- perl -ne '($id, $tmp) = split( / /, $_, 2 ); $tmp =~ s/ //g; print "$id "; print join(" ", split( //, $tmp ) );' a4 >a5

cat a5

R921B12 C C T T C C T C T T . . .

R921C12 C C T T T T C C T T . . .

5- join <(sed -e 's/\t/ /g' 6col_ped | sort -k 1) <(sort -k 1 a5) > a6

cat a6

R921B12 273504 2910033 2910215 1 Resistant C C T T C C T C T T . . .

R921C12 273487 2950577 2950350 1 Resistant C C T T T T C C T T . . .

this is my ped file that I get error . missing data in ped file is '-'. may be you right and I coudent seperate raw from each other. I write what I did? may be I make mistake ?

ADD REPLYlink modified 13 months ago • written 13 months ago by mary210

I was able to input your data like this:

cat MapInfo.csv 
SNP_Name,Chr,Coordinate,R923A04,R921B12,R921C12,R921D12,R921E12
CL635944_160.1,0,0,--,CC,CC,CC,TC,TC
CR_594.1,0,0,--,TT,TT,TT,TT,TT
CR_816.1,0,0,--,CC,TT,TT,TT,TT

awk '{if (NR!=1) {print $2" "$1" 0 "$3}}' FS=, MapInfo.csv > MapInfo.map
cat MapInfo.map 
0 CL635944_160.1 0 0
0 CR_594.1 0 0
0 CR_816.1 0 0

cat test.ped 
R921B12 273504 2910033 2910215 1 Resistant C C T T C C
R921C12 273487 2950577 2950350 1 Resistant C C T T T T

/Programs/plink1.90/plink --file test --map MapInfo.map --noweb --missing-genotype N
PLINK v1.90b3.38 64-bit (7 Jun 2016)       https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --file test
  --map MapInfo.map
  --missing-genotype N
  --noweb

Note: --noweb has no effect since no web check is implemented yet.
15037 MB RAM detected; reserving 7518 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (3 variants, 2 people).
--file: plink.bed + plink.bim + plink.fam written.

However, something does not make sense with your data. For any PLINK data input, you should have a .map and .ped file.

I do not know what is your input data, but it has 3 columns named:

  • SNP_Name
  • Chr
  • Coordinate

That is enough to create the MAP file.

It then has:

  • R923A04
  • R921B12
  • R921C12
  • R921D12
  • R921E12

These must be sample IDs. In your original data (a.tsv), these columns represent the sample genotypes. In the plink input file, a6, your samples should be represented on rows.

Let me know if any of this helps.

Kevin

ADD REPLYlink written 13 months ago by Kevin Blighe46k

Hi I cheak every thing that may be related to this problem, but unfurtunatly its dosent work. the first column (sample IDs) is Familly IID in ped file so I think its not related. just may be I make wrong command in this step (python -c "import sys; print('\n'.join(' '.join(c) for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))" < a3 > a4) that cused two raw wasnot seperated.

ADD REPLYlink written 13 months ago by mary210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 946 users visited in the last hour