Question: Convert 80 VCF to PED and MAP plink files
1
gravatar for lmobuchon
3.5 years ago by
lmobuchon40
lmobuchon40 wrote:

Hi everyone,

I am new in plink and I am sorry for my naive question. I have 80 VCF files (1 per patient), and I would like to create plink files (map and ped) for further analyses. I tried to open them with plink2, to convert them to BCF with BCFtools or to create plink files with VCFtools but all gave me errors. Do you think I need to merge the 80 VCF first ? Do you have any other ideas ?

Thank you a lot in advance, Best, Lenha

plink format vcf • 3.7k views
ADD COMMENTlink modified 9 months ago by zx87548.4k • written 3.5 years ago by lmobuchon40
1

You should be able to convert a single VCF to plink format using plink --vcf <input_vcf_name> --recode --out <output_plink_name>. Are you able to convert one file successfully, or if not, what errors does it give you?

ADD REPLYlink written 3.5 years ago by leekaiinthesky170

Thank you very much ! Actually I have tried:

./plink --vcf *.vcf --out all

And the error is:

Random number seed: 1463496194
15971 MB RAM detected; reserving 2047 MB for main workspace.
Error: Multiple instances of '_' in sample ID.
If you do not want '_' to be treated as a FID/IID delimiter, use --double-id or
--const-fid to choose a different method of converting VCF sample IDs to PLINK
IDs, or --id-delim to change the FID/IID delimiter.

I should merge all the VCF in a one file ?

ADD REPLYlink modified 9 months ago by zx87548.4k • written 3.5 years ago by lmobuchon40

Would be useful if you post the commands and errors.

ADD REPLYlink written 3.5 years ago by geek_y10.0k

Sorry ! :) This the command line that I tried to use and the error: ./plink --noweb --vcf file1.vcf --recode --out plink1

* Unused command line option: --vcf * Unused command line option: file1.vcf ERROR: Problem parsing the command line arguments.

ADD REPLYlink written 3.5 years ago by lmobuchon40
1
gravatar for zx8754
9 months ago by
zx87548.4k
London
zx87548.4k wrote:

As the error suggests plink is treating "_" in sample IDs as a delimiter, the solutions is provided at GitHub issue #21

plink --noweb --const-fid 0 --vcf myFile1.vcf --recode --out myPlinkFile

"--const-fid 0" is probably the simplest way; it causes all family IDs to be set to "0", and individual IDs to be set to the ID in the VCF file. (The default behavior is to treat '_' as a delimiter between the FID and IID; this obviously has a problem with multiple underscores.)

I will modify the error message to suggest --const-fid as a workaround.
-- Christopher Chang


Other relevant flags from the manual for vcf inputs:

VCF files just contain sample IDs, instead of the distinct family and within-family IDs tracked by PLINK. We offer three ways to convert these IDs:

  • --double-id causes both family and within-family IDs to be set to the sample ID.
  • --const-fid converts sample IDs to within-family IDs while setting all family IDs to a single value (default '0').
  • --id-delim causes sample IDs to be parsed as [FID][delimiter][IID]; the default delimiter is '_'. If any sample ID does not contain exactly one instance of the delimiter, an error is normally reported; however, if you have simultaneously specified --double-id or --const-fid, PLINK will fall back on that approach to handle zero-delimiter IDs.

If none of these three flags is present, the loader defaults to --double-id + --id-delim

ADD COMMENTlink modified 9 months ago • written 9 months ago by zx87548.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1988 users visited in the last hour