For PCA analysis, I have common coordinates between Inhouse exome data and 1000 Genomes data (Phase 1). I want to retrieve genotypes for those common SNPs from VCF files in 1000 genomes [then convert to plink for smartPCA). I thought of a option of converting all VCF files in phase 1 to ped which is very memory intensive. What can be the possible solution for this problem ?
PLINK 1.9 supports direct conversion of VCF to .bed+.bim+.fam , which should be readable by smartpca. For example,
plink --vcf ALL.chr1.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz --out 1000g_chr1
I have used --merge-list but it throws an error and warnings for snp inconsistencies. What i understand is merging is used when 2 files have same snps and you have to merge data for different individuals with same snps while I want just want to concatenate two vcf files from different chromosomes say chr1 and chr2 for making one single plink file, in short i want to concatenate all vcf files of 1000 g to make one plink file.