Question: Gwas Snp Imputation In Plant Species
gravatar for kumar.vinod81
6.2 years ago by
New Delhi
kumar.vinod81280 wrote:

I am working on GWAS with a diploid plant species using 6000 SNPs with 300 individuals. My SNP data have several missing genotypes. Is there any possible way to impute my SNP dataset? As there is no any reference panel available like HapMap project in human. What should I do with my SNP dataset or I can directly use them for association study. Thanks, Vinod

imputation snp • 3.8k views
ADD COMMENTlink modified 6.2 years ago by matted7.0k • written 6.2 years ago by kumar.vinod81280
gravatar for Philipp Bayer
6.2 years ago by
Philipp Bayer6.0k
Philipp Bayer6.0k wrote:

Try this:

Imputing missing genotypes with weighted k nearest neighbors.

This approach, called KNNcatImpute, searches for the k SNPs that are most similar to the SNP whose missing values need to be replaced and uses these k SNPs to impute the missing values. Alternatively, KNNcatImpute can search for the k nearest subjects. In this situation, the missing values of an individual are imputed by considering subjects showing a DNA pattern similar to the one of this individual.

So it doesn't have to use a reference panel but can impute your data based on similar individuals in your dataset.

Edit: If you don't want to try imputation, whether your data can tell you anything depends entirely on the statistical approach you use. For example, a logistic regression completely breaks down once you got a couple of missing values. I've had good results with a compressed mixed linear model as implemented in TASSEL or GAPIT when missing data was present.

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Philipp Bayer6.0k

yes, I am also getting the results with both of the tool you have mentioned here, but just I want to check about how the results are changing after imputation of missing values. I'll try imputation with KNNcatimpute. Thanks for your valuable help....

ADD REPLYlink written 6.2 years ago by kumar.vinod81280

One more thing I forget in my last reply that GAPIT do analysis but it automatically convert all the missing values in heterozygotes that's why first I want to impute my data.

ADD REPLYlink written 6.1 years ago by kumar.vinod81280
gravatar for matted
6.2 years ago by
Boston, United States
matted7.0k wrote:

R/qtl can impute missing genotypes if you're working with an experimental cross (maybe I can assume that since you mention a model organism).

Depending on the amount of missing data, a typical easy thing to do is ignore the markers with many missing calls. If you can be sure missingness does not correlate with phenotype, you can also just ignore the missed calls in a per-marker analysis.

ADD COMMENTlink written 6.2 years ago by matted7.0k

Thanks for your help... I've not tried with R/qtl but can you little explain that on what basis it impute the missing values. I cant ignore the missing values as it is a GWAS and there are lot of missing values. The help of you guys really taking me towards important issues which were earlier unavailable with me.... Thanks


ADD REPLYlink written 6.2 years ago by kumar.vinod81280
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1133 users visited in the last hour