Question

Gwas Snp Imputation In Plant Species

2

Entering edit mode

11.1 years ago

kumar.vinod81 ▴ 330

I am working on GWAS with a diploid plant species using 6000 SNPs with 300 individuals. My SNP data have several missing genotypes. Is there any possible way to impute my SNP dataset? As there is no any reference panel available like HapMap project in human. What should I do with my SNP dataset or I can directly use them for association study. Thanks, Vinod

snp imputation • 5.3k views

ADD COMMENT • link updated 11.1 years ago by matted 7.8k • written 11.1 years ago by kumar.vinod81 ▴ 330

score 3 · Answer 1 · 2013-03-18

3

Entering edit mode

11.1 years ago

Philipp Bayer 8.3k

Try this:

Imputing missing genotypes with weighted k nearest neighbors.

This approach, called KNNcatImpute, searches for the k SNPs that are most similar to the SNP whose missing values need to be replaced and uses these k SNPs to impute the missing values. Alternatively, KNNcatImpute can search for the k nearest subjects. In this situation, the missing values of an individual are imputed by considering subjects showing a DNA pattern similar to the one of this individual.

So it doesn't have to use a reference panel but can impute your data based on similar individuals in your dataset.

Edit: If you don't want to try imputation, whether your data can tell you anything depends entirely on the statistical approach you use. For example, a logistic regression completely breaks down once you got a couple of missing values. I've had good results with a compressed mixed linear model as implemented in TASSEL or GAPIT when missing data was present.

ADD COMMENT • link 11.1 years ago by Philipp Bayer 8.3k

0

Entering edit mode

yes, I am also getting the results with both of the tool you have mentioned here, but just I want to check about how the results are changing after imputation of missing values. I'll try imputation with KNNcatimpute. Thanks for your valuable help....

ADD REPLY • link 11.1 years ago by kumar.vinod81 ▴ 330

0

Entering edit mode

One more thing I forget in my last reply that GAPIT do analysis but it automatically convert all the missing values in heterozygotes that's why first I want to impute my data.

ADD REPLY • link 11.1 years ago by kumar.vinod81 ▴ 330

score 3 · Answer 2 · 2013-03-18

3

Entering edit mode

11.1 years ago

matted 7.8k

R/qtl can impute missing genotypes if you're working with an experimental cross (maybe I can assume that since you mention a model organism).

Depending on the amount of missing data, a typical easy thing to do is ignore the markers with many missing calls. If you can be sure missingness does not correlate with phenotype, you can also just ignore the missed calls in a per-marker analysis.

ADD COMMENT • link 11.1 years ago by matted 7.8k

0

Entering edit mode

Thanks for your help... I've not tried with R/qtl but can you little explain that on what basis it impute the missing values. I cant ignore the missing values as it is a GWAS and there are lot of missing values. The help of you guys really taking me towards important issues which were earlier unavailable with me.... Thanks

Thanks....

ADD REPLY • link 11.1 years ago by kumar.vinod81 ▴ 330