Gwas Snp Imputation In Plant Species
2
2
Entering edit mode
11.1 years ago
kumar.vinod81 ▴ 330

I am working on GWAS with a diploid plant species using 6000 SNPs with 300 individuals. My SNP data have several missing genotypes. Is there any possible way to impute my SNP dataset? As there is no any reference panel available like HapMap project in human. What should I do with my SNP dataset or I can directly use them for association study. Thanks, Vinod

snp imputation • 5.3k views
ADD COMMENT
3
Entering edit mode
11.1 years ago

Try this:

Imputing missing genotypes with weighted k nearest neighbors.

This approach, called KNNcatImpute, searches for the k SNPs that are most similar to the SNP whose missing values need to be replaced and uses these k SNPs to impute the missing values. Alternatively, KNNcatImpute can search for the k nearest subjects. In this situation, the missing values of an individual are imputed by considering subjects showing a DNA pattern similar to the one of this individual.

So it doesn't have to use a reference panel but can impute your data based on similar individuals in your dataset.

Edit: If you don't want to try imputation, whether your data can tell you anything depends entirely on the statistical approach you use. For example, a logistic regression completely breaks down once you got a couple of missing values. I've had good results with a compressed mixed linear model as implemented in TASSEL or GAPIT when missing data was present.

ADD COMMENT
0
Entering edit mode

yes, I am also getting the results with both of the tool you have mentioned here, but just I want to check about how the results are changing after imputation of missing values. I'll try imputation with KNNcatimpute. Thanks for your valuable help....

ADD REPLY
0
Entering edit mode

One more thing I forget in my last reply that GAPIT do analysis but it automatically convert all the missing values in heterozygotes that's why first I want to impute my data.

ADD REPLY
3
Entering edit mode
11.1 years ago
matted 7.8k

R/qtl can impute missing genotypes if you're working with an experimental cross (maybe I can assume that since you mention a model organism).

Depending on the amount of missing data, a typical easy thing to do is ignore the markers with many missing calls. If you can be sure missingness does not correlate with phenotype, you can also just ignore the missed calls in a per-marker analysis.

ADD COMMENT
0
Entering edit mode

Thanks for your help... I've not tried with R/qtl but can you little explain that on what basis it impute the missing values. I cant ignore the missing values as it is a GWAS and there are lot of missing values. The help of you guys really taking me towards important issues which were earlier unavailable with me.... Thanks

Thanks....

ADD REPLY

Login before adding your answer.

Traffic: 2510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6