Question: Convert and filter VCF by a pre-existing BIM
0
gravatar for jlawlor
24 months ago by
jlawlor0
USA/Huntsville/HudsonAlpha Institute for Biotechnology
jlawlor0 wrote:

Hi all, I'm trying to convert a VCF to PLINK format (BED/BIM/FAM), however I'd like to do so using a fixed, pre-existing BIM file. In other words, I'd like to filter an arbitrary VCF down to the SNPs listed in a particular BIM file (and also add in a reference or no-call if the SNP isn't in said VCF).

Is this possible using built-in commands for PLINK? (I'd prefer not to go the route of manually editing PED/MAP files so that I don't inadvertently swap data.)

Context: I'm running ancestry analysis with ADMIXTURE (https://www.genetics.ucla.edu/software/admixture/) in projection mode to project a new sample onto an already-analyzed reference population, which requires identical BIM files. I can do this manually by converting, joining, and manipulating the files; however, I'd like to integrate this into an automated pipeline.

Thanks!

plink admixture vcf • 682 views
ADD COMMENTlink modified 24 months ago by chrchang5235.0k • written 24 months ago by jlawlor0
0
gravatar for chrchang523
24 months ago by
chrchang5235.0k
United States
chrchang5235.0k wrote:

Assuming your variants have unique IDs, you can use plink --write-snplist (or Unix "cut -d [delimiter] -f 2") on the .bim file to create a list of variant IDs to keep, and then plink --extract to keep just those variants in another dataset.

ADD COMMENTlink written 24 months ago by chrchang5235.0k

(Evidently --extract works with a BIM file without the unix cut command. handy!)

That's nearly what I need--however, the output BIM (let's call it test_set.bim) now has fewer variants than the original "ids_to_keep.bim." Is there a flag I can add or a second step I can perform to fill in the missing genotypes? (Preferably with no calls.)

So far I've tried --fill-missing-a2, but that doesn't seem to be what I want, or I've made some mistake somewhere, since it gives me the error "Error: --fill-missing-a2 cannot be used on an unsorted .bim file."

Using: ~/tools/plink/plink --extract ids_to_keep.bim --vcf input.vcf --make-bed --out test_set

Thanks for the advice!

ADD REPLYlink written 24 months ago by jlawlor0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 723 users visited in the last hour