VCF to Plink files
7 weeks ago
I am hoping somebody with experience with plink could help. I am trying to generate plink .bim, .fam and .bed files from a .vcf (one with variants filtered out and one that keeps the variants) and have toyed around with a couple of different commands that I found on biostars posts and google.

The documentation of going from .vcf to plink files is a bit more sparse so I'd like to check with more experienced researchers here if I am proceeding correctly.

My outcomes have fallen into two camps. For .fam files, a file was generated with an --allow-extra-chr flag at the end. For both the .bim and .bed files, I get an error:

Error: out.hg38NoVariants-temporary.pvar.zst has a split chromosome. Use
--make-pgen + --sort-vars to remedy this.

Below are the commands I am trying and the output/errors I am receiving. I would be very appreciative if somebody could tell me if my .fam files are correct and what to do to successfully generate all files including how exactly to use "--make-pgen" and "--sort-vars".

Are these producing the correct .fam files?

./plink2 --vcf out.hg38KeepVariants.vcf --make-just-fam --out out.hg38KeepVariants --allow-extra-chr

Writing out.hg38KeepVariants.fam ... done.

My .bed command asks to add a --allow-extra-chr flag but after adding the flag, there is an error:

./plink2 --vcf out.hg38NoVariants.vcf --make-bed --out out.hg38NoVariants    

Error: Invalid chromosome code '15_KI270727v1_random' on line 382274 of --vcf
(Use --allow-extra-chr to force it to be accepted.)

.... now with added --allow-extra-chr flag.

./plink2 --vcf out.hg38NoVariants.vcf --make-bed --out out.hg38NoVariants --allow-extra-chr

Error: out.hg38NoVariants-temporary.pvar.zst has a split chromosome. Use
--make-pgen + --sort-vars to remedy this.

...with or without a flag, generating a .bim file causes a problem.

./plink2 --vcf out.hg38NoVariants.vcf --make-just-bim --out out.hg38NoVariants

Error: out.hg38NoVariants.vcf has a split chromosome. Use --make-pgen +
--sort-vars to remedy this.

I've preprocessed data before but never SNP data. Again, if anybody has experience with this pipeline, I'd appreciate your help. Thank you.

vcf pink pgen psam • 367 views
7 weeks ago

Have you tried doing what the error message says?

(Note that I’m writing this as an answer rather than just a comment. This is intentional.)

Thank you. I added --allow-extra-chr and a .fam file was made. Is that output correct? ./plink2 --vcf out.hg38KeepVariants.vcf --make-just-fam --out out.hg38KeepVariants --allow-extra-chr

I then used your advice and used the command:

./plink2 --vcf out.hg38NoVariants.vcf --make-pgen --out out.hg38NoVariants --allow-extra-chr --sort-vars

...which generated a .pgen file.

I then used: ./plink2 --pfile out.hg38NoVariants --make-just-bim --out out.hg38NoVariants --allow-extra-chr


./plink2 --pfile out.hg38KeepVariants --make-bed --out out.hg38KeepVariants --allow-extra-chr

...which both ran through without error. Are they correct though?


