Question: Convert and filter VCF by a pre-existing BIM
3.4 years ago
jlawlor
wrote:

Hi all, I'm trying to convert a VCF to PLINK format (BED/BIM/FAM), however I'd like to do so using a fixed, pre-existing BIM file. In other words, I'd like to filter an arbitrary VCF down to the SNPs listed in a particular BIM file (and also add in a reference or no-call if the SNP isn't in said VCF).

Is this possible using built-in commands for PLINK? (I'd prefer not to go the route of manually editing PED/MAP files so that I don't inadvertently swap data.)

Context: I'm running ancestry analysis with ADMIXTURE ( in projection mode to project a new sample onto an already-analyzed reference population, which requires identical BIM files. I can do this manually by converting, joining, and manipulating the files; however, I'd like to integrate this into an automated pipeline.


plink admixture vcf
written 3.4 years ago by jlawlor
3.4 years ago
chrchang523
wrote:

Assuming your variants have unique IDs, you can use plink --write-snplist (or Unix "cut -d [delimiter] -f 2") on the .bim file to create a list of variant IDs to keep, and then plink --extract to keep just those variants in another dataset.

written 3.4 years ago by chrchang523

(Evidently --extract works with a BIM file without the unix cut command. handy!)

That's nearly what I need--however, the output BIM (let's call it test_set.bim) now has fewer variants than the original "ids_to_keep.bim." Is there a flag I can add or a second step I can perform to fill in the missing genotypes? (Preferably with no calls.)

So far I've tried --fill-missing-a2, but that doesn't seem to be what I want, or I've made some mistake somewhere, since it gives me the error "Error: --fill-missing-a2 cannot be used on an unsorted .bim file."

Using: ~/tools/plink/plink --extract ids_to_keep.bim --vcf input.vcf --make-bed --out test_set

Thanks for the advice!

written 3.4 years ago by jlawlor
