I'm using plink's linkage disequilibrium logic to filter my variants. The team I'm working with needs everything to be kept as VCF files, but one problem is that the original VCF files don't contain any variant IDs, so when I later use plink's --extract command I end up with no variants. I tried using plink's --recover-var-ids arg to hopefully give a hint to plink that it needs to parse the --extract variant IDs, but plink complains that --recover-var-ids is not a recognized flag. My commands look like this:
plink \
--vcf $INPUT_FILE_PATH \
--indep $LD_WINDOW_SIZE_KB $LD_STEP_SIZE $VIF_THRESHOLD \
--out $LD_FILE_PATH \
--set-missing-var-ids @:#[b38]\$1,\$2 \
--allow-extra-chr
plink \
--vcf /inputs/data.vcf.gz \
--extract ${LD_FILE_PATH}.prune.in \
--out /outputs/data.vcf.gz \
--recover-var-ids \
--recode vcf
Would this not be a problem if I used plink's --bfile instead of VCF files? I suppose I could just add another step to convert the bed files back to VCFs.
--recover-var-ids, and decent VCF re-export capability, require plink 2.0. (plink 1.9 doesn't even keep REF/ALT allele order straight by default, because there was no way to do so without breaking compatibility with plink 1.07.)--recover-var-idshere. (When you do, it is necessary to also provide a file with the IDs; see https://www.cog-genomics.org/plink/2.0/data#recover_var_ids for details.) Instead, you have the right idea with--bfile, except that you probably want to use--pfile/--make-pgeninstead (that format is capable of preserving many more types of information in the VCF). Alternatively, you could include the same--set-missing-var-idstemplate in the second command that you did in the first.