I am working on GWAS with a diploid plant species using 6000 SNPs with 300 individuals. My SNP data have several missing genotypes. Is there any possible way to impute my SNP dataset? As there is no any reference panel available like HapMap project in human. What should I do with my SNP dataset or I can directly use them for association study. Thanks, Vinod
This approach, called KNNcatImpute, searches for the k SNPs that are most similar to the SNP whose missing values need to be replaced and uses these k SNPs to impute the missing values. Alternatively, KNNcatImpute can search for the k nearest subjects. In this situation, the missing values of an individual are imputed by considering subjects showing a DNA pattern similar to the one of this individual.
So it doesn't have to use a reference panel but can impute your data based on similar individuals in your dataset.
Edit: If you don't want to try imputation, whether your data can tell you anything depends entirely on the statistical approach you use. For example, a logistic regression completely breaks down once you got a couple of missing values. I've had good results with a compressed mixed linear model as implemented in TASSEL or GAPIT when missing data was present.
R/qtl can impute missing genotypes if you're working with an experimental cross (maybe I can assume that since you mention a model organism).
Depending on the amount of missing data, a typical easy thing to do is ignore the markers with many missing calls. If you can be sure missingness does not correlate with phenotype, you can also just ignore the missed calls in a per-marker analysis.