Question: How To Deal With Missing (Na) Data In Plink
0
gravatar for xiaoyanyan97
6.6 years ago by
xiaoyanyan970 wrote:

I have a database like this

 1 1 0 0 1  1  A A  G T
2 1 0 0 1  1  A C  T G
3 1 0 0 1  1  0 0  G G
4 1 0 0 1  2  A C  T T
5 1 0 0 1  2  C C  G T
6 1 0 0 1  2  C C  T T
.ped

1 snp1 0 1
1 snp2 0 2
.map

I use the order --recodeA convert them to

FID IID PAT MAT SEX PHENOTYPE snp1_A snp2_G
1 1 0 0 1 1 2 1
2 1 0 0 1 1 1 1
3 1 0 0 1 1 NA 2
4 1 0 0 1 2 1 0
5 1 0 0 1 2 0 1
6 1 0 0 1 2 0 0
.raw

there is NA in my data, but it is not allowed in analysis. How to deal with it in plink.

Thank you.

plink snp • 4.2k views
ADD COMMENTlink modified 6.6 years ago by Istvan Albert ♦♦ 84k • written 6.6 years ago by xiaoyanyan970
1

Please clarify, why do you need to convert it to raw (recodeA) format? Are you going to use plink for analysis, if yes, then why conversion?

ADD REPLYlink written 6.6 years ago by zx87549.4k

because I am calculaing linear-regression with the model is not allowed Na( missing genotype),so I have to convert it to any other value.someone told me the plink can remedy the Na(missing genotype),I have found but can't succeed.Because my data come from experiment,I can‘t code NA to any value.

ADD REPLYlink written 6.6 years ago by xiaoyanyan970

Still not clear why you need to convert to raw format. You could just use plink --file mydata --linear, with original PEDMAP file. Plink - Linear and logistic models

ADD REPLYlink written 6.6 years ago by zx87549.4k

sorry,it is a other model group-lasso,it's not allowed NA. before I use it,I have to convert my data(include 50kb snp and they are coded withATCG)to 0,1,2.because there are 00 in my old data,so after convert ,NA is in the new data.

ADD REPLYlink written 6.6 years ago by xiaoyanyan970

00 means nocall, when converted to raw, it becomes NA - not available. These samples need to be excluded from analysis. In R to exclude samples: snp1_A <- my.raw[ !is.na(my.raw$snp1_A), "snp1_A"]

ADD REPLYlink written 6.6 years ago by zx87549.4k

I have try it ,but my model is a function which is designed already.Waht I need to do is convert my data as x(it is a matrix include recoded missing value), as your method,the data will not intact。

ADD REPLYlink written 6.6 years ago by xiaoyanyan970

is there a method in plink that can convert the NA base on the other snps,then the error will be lower.

ADD REPLYlink written 6.6 years ago by xiaoyanyan970
1

Open file with notepad & replace NA with whatever you like.

ADD REPLYlink written 6.6 years ago by PoGibas4.8k

because my data is real, my genotypes are coded 0,1,2,so I could't code na( missing genotypes)with I like.

ADD REPLYlink written 6.6 years ago by xiaoyanyan970
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1227 users visited in the last hour