For PCA analysis, I have common coordinates between Inhouse exome data and 1000 Genomes data (Phase 1). I want to retrieve genotypes for those common SNPs from VCF files in 1000 genomes [then convert to plink for smartPCA). I thought of a option of converting all VCF files in phase 1 to ped which is very memory intensive. What can be the possible solution for this problem?
thanks, I will try this :)
Consider using the 1000Genomes data for imputing the genotype of the SNPs missing in your dataset: http://www.1000genomes.org/faq/can-i-use-1000-genomes-data-imputation