Hi all, I'm trying to convert a VCF to PLINK format (BED/BIM/FAM), however I'd like to do so using a fixed, pre-existing BIM file. In other words, I'd like to filter an arbitrary VCF down to the SNPs listed in a particular BIM file (and also add in a reference or no-call if the SNP isn't in said VCF).
Is this possible using built-in commands for PLINK? (I'd prefer not to go the route of manually editing PED/MAP files so that I don't inadvertently swap data.)
Context: I'm running ancestry analysis with ADMIXTURE (https://www.genetics.ucla.edu/software/admixture/) in projection mode to project a new sample onto an already-analyzed reference population, which requires identical BIM files. I can do this manually by converting, joining, and manipulating the files; however, I'd like to integrate this into an automated pipeline.
Thanks!
(Evidently --extract works with a BIM file without the unix cut command. handy!)
That's nearly what I need--however, the output BIM (let's call it test_set.bim) now has fewer variants than the original "ids_to_keep.bim." Is there a flag I can add or a second step I can perform to fill in the missing genotypes? (Preferably with no calls.)
So far I've tried --fill-missing-a2, but that doesn't seem to be what I want, or I've made some mistake somewhere, since it gives me the error "Error: --fill-missing-a2 cannot be used on an unsorted .bim file."
Using: ~/tools/plink/plink --extract ids_to_keep.bim --vcf input.vcf --make-bed --out test_set
Thanks for the advice!