I have next generation exome sequencing data and genotype calls for some case control samples and like to know that the identified rare variant is not because of admixed population. is there any good source of sequencing data that can be used to control this. i don't have the gwas for these samples and only exome seq so i can not do normal pc analyses. I think if we perform imputation using phased haplotypes by impute2, the identified mutation will disappear if it doesn't match with the background haplotype. any suggestions...thanks
Not sure I understand how the mutations will disappear. The ones of interest are usually rare, then they can arise on any haplotype.
Besides that, I think that in the exome you still have a interesting amount of "frequent" mutations (obviously not scores of MAF > 0.3 like in GWAs genotypes). This is a good basis to try to perform PCA-like studies (MDS from Plink is not bad).
in Ng et al. they show that more than 70% of one person's ns SNPs are common. I guess these may be even more the casse in synonymous. So although you will not have the same resource as in GWAs, you can still match your sample with 1000 G, for instance.
With VAAST we distribute a background file containing many ethnic groups from 1K genomes. This seems to offset population stratification. I would be happy to help you try VAAST. All you need are the VCF variant files. We have had luck identifying several disease causing genes and genes underlying morphological traits.