Question

run plink with locus has more than 2 alleles

0

Entering edit mode

5.8 years ago

mary ▴ 210

dear All

I make ped and map file on shell, when I read file by plink I get below problem;

./plink --file test --noweb --missing-genotype N

54241 (of 54241) markers to be included from [ test.map ]

ERROR: Locus 1 has >2 alleles: individual R921C12 273487 has genotype [ T C ] but we've already seen [ - ] and [ T ]

I cheak my file, I seems ok, the data is indeed 'CC' with no -'s or T's nearby! the length of each line (i.e. for each individual) is consistent throughout. I've tried both tab- and space-demilited files, but no difference. I dont undrestand why I get this error. this is the raw which I get that problem:

R921C12 273487 2950577 2950350 1 Resistant C C T T T T C C T T . . .

any idea?

plink snps ROH • 2.1k views

ADD COMMENT • link 5.8 years ago by mary ▴ 210

0

Entering edit mode

Is this the first sample in your file or are there others? PLINK may have identified the '-' allele in another sample prior to this one. Also, are you sure that missing genotypes are encoded as 'N" for your data? PLINK normally expects '-9'.

ADD REPLY • link 5.8 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi, this is 2th sample in ped file. let me I write my command for make a.ped:

1- I have this file:

SNP_Name,Chr,Coordinate,R923A04,R921B12,R921C12,R921D12,R921E12

CL635944_160.1,0,0,--,CC,CC,CC,TC,TC

CR_594.1,0,0,--,TT,TT,TT,TT,TT

CR_816.1,0,0,--,CC,TT,TT,TT,TT

2- I use these two command

sed 's/,/ /g' a.csv > a2

awk '{$1=$2=$3=""; print $0}' a2 > a3

cat a3 (first of lin is empty)

R921B12 R921C12 R921D12 R921E12 R921H11 

CC CC CC TC TC

TT TT TT TT TT

3- python -c "import sys; print('\n'.join(' '.join(c) for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))" < a3 > a4

cat a4

R921B12 CC TT CC TC TT . . .

R921C12 CC TT TT CC TT . . .

4- perl -ne '($id, $tmp) = split( / /, $_, 2 ); $tmp =~ s/ //g; print "$id "; print join(" ", split( //, $tmp ) );' a4 >a5

cat a5

R921B12 C C T T C C T C T T . . .

R921C12 C C T T T T C C T T . . .

5- join <(sed -e 's/\t/ /g' 6col_ped | sort -k 1) <(sort -k 1 a5) > a6

cat a6

R921B12 273504 2910033 2910215 1 Resistant C C T T C C T C T T . . .

R921C12 273487 2950577 2950350 1 Resistant C C T T T T C C T T . . .

this is my ped file that I get error . missing data in ped file is '-'. may be you right and I coudent seperate raw from each other. I write what I did? may be I make mistake ?

ADD REPLY • link 5.8 years ago by mary ▴ 210

0

Entering edit mode

I was able to input your data like this:

cat MapInfo.csv 
SNP_Name,Chr,Coordinate,R923A04,R921B12,R921C12,R921D12,R921E12
CL635944_160.1,0,0,--,CC,CC,CC,TC,TC
CR_594.1,0,0,--,TT,TT,TT,TT,TT
CR_816.1,0,0,--,CC,TT,TT,TT,TT

awk '{if (NR!=1) {print $2" "$1" 0 "$3}}' FS=, MapInfo.csv > MapInfo.map
cat MapInfo.map 
0 CL635944_160.1 0 0
0 CR_594.1 0 0
0 CR_816.1 0 0

cat test.ped 
R921B12 273504 2910033 2910215 1 Resistant C C T T C C
R921C12 273487 2950577 2950350 1 Resistant C C T T T T

/Programs/plink1.90/plink --file test --map MapInfo.map --noweb --missing-genotype N
PLINK v1.90b3.38 64-bit (7 Jun 2016)       https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --file test
  --map MapInfo.map
  --missing-genotype N
  --noweb

Note: --noweb has no effect since no web check is implemented yet.
15037 MB RAM detected; reserving 7518 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (3 variants, 2 people).
--file: plink.bed + plink.bim + plink.fam written.

However, something does not make sense with your data. For any PLINK data input, you should have a .map and .ped file.

I do not know what is your input data, but it has 3 columns named:

SNP_Name
Chr
Coordinate

That is enough to create the MAP file.

It then has:

R923A04
R921B12
R921C12
R921D12
R921E12

These must be sample IDs. In your original data (a.tsv), these columns represent the sample genotypes. In the plink input file, a6, your samples should be represented on rows.

Let me know if any of this helps.

Kevin

ADD REPLY • link 5.8 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi I cheak every thing that may be related to this problem, but unfurtunatly its dosent work. the first column (sample IDs) is Familly IID in ped file so I think its not related. just may be I make wrong command in this step (python -c "import sys; print('\n'.join(' '.join(c) for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))" < a3 > a4) that cused two raw wasnot seperated.

ADD REPLY • link 5.8 years ago by mary ▴ 210