Question: Convert 80 VCF to PED and MAP plink files
3.5 years ago
lmobuchon40 wrote:

Hi everyone,

I am new in plink and I am sorry for my naive question. I have 80 VCF files (1 per patient), and I would like to create plink files (map and ped) for further analyses. I tried to open them with plink2, to convert them to BCF with BCFtools or to create plink files with VCFtools but all gave me errors. Do you think I need to merge the 80 VCF first ? Do you have any other ideas ?

Thank you a lot in advance, Best, Lenha

plink format vcf
modified 9 months ago by zx87548.4k • written 3.5 years ago by lmobuchon40

You should be able to convert a single VCF to plink format using plink --vcf <input_vcf_name> --recode --out <output_plink_name>. Are you able to convert one file successfully, or if not, what errors does it give you?

written 3.5 years ago by leekaiinthesky170

Thank you very much ! Actually I have tried:

./plink --vcf *.vcf --out all

And the error is:

Random number seed: 1463496194
15971 MB RAM detected; reserving 2047 MB for main workspace.
Error: Multiple instances of '_' in sample ID.
If you do not want '_' to be treated as a FID/IID delimiter, use --double-id or
--const-fid to choose a different method of converting VCF sample IDs to PLINK
IDs, or --id-delim to change the FID/IID delimiter.

I should merge all the VCF in a one file ?

modified 9 months ago by zx87548.4k • written 3.5 years ago by lmobuchon40

Would be useful if you post the commands and errors.

written 3.5 years ago by geek_y10.0k

Sorry ! :) This the command line that I tried to use and the error: ./plink --noweb --vcf file1.vcf --recode --out plink1

* Unused command line option: --vcf * Unused command line option: file1.vcf ERROR: Problem parsing the command line arguments.

written 3.5 years ago by lmobuchon40
9 months ago
zx87548.4k wrote:

As the error suggests plink is treating "_" in sample IDs as a delimiter, the solutions is provided at GitHub issue #21

plink --noweb --const-fid 0 --vcf myFile1.vcf --recode --out myPlinkFile

"--const-fid 0" is probably the simplest way; it causes all family IDs to be set to "0", and individual IDs to be set to the ID in the VCF file. (The default behavior is to treat '_' as a delimiter between the FID and IID; this obviously has a problem with multiple underscores.)

I will modify the error message to suggest --const-fid as a workaround.
-- Christopher Chang

Other relevant flags from the manual for vcf inputs:

VCF files just contain sample IDs, instead of the distinct family and within-family IDs tracked by PLINK. We offer three ways to convert these IDs:

  • --double-id causes both family and within-family IDs to be set to the sample ID.
  • --const-fid converts sample IDs to within-family IDs while setting all family IDs to a single value (default '0').
  • --id-delim causes sample IDs to be parsed as [FID][delimiter][IID]; the default delimiter is '_'. If any sample ID does not contain exactly one instance of the delimiter, an error is normally reported; however, if you have simultaneously specified --double-id or --const-fid, PLINK will fall back on that approach to handle zero-delimiter IDs.

If none of these three flags is present, the loader defaults to --double-id + --id-delim

modified 9 months ago • written 9 months ago by zx87548.4k
