Convert 80 VCF to PED and MAP plink files
1
1
Entering edit mode
7.9 years ago
lmobuchon ▴ 40

Hi everyone,

I am new in plink and I am sorry for my naive question. I have 80 VCF files (1 per patient), and I would like to create plink files (map and ped) for further analyses. I tried to open them with plink2, to convert them to BCF with BCFtools or to create plink files with VCFtools but all gave me errors. Do you think I need to merge the 80 VCF first ? Do you have any other ideas ?

Thank you a lot in advance, Best, Lenha

vcf plink format • 10k views
ADD COMMENT
1
Entering edit mode

You should be able to convert a single VCF to plink format using plink --vcf <input_vcf_name> --recode --out <output_plink_name>. Are you able to convert one file successfully, or if not, what errors does it give you?

ADD REPLY
0
Entering edit mode

Thank you very much ! Actually I have tried:

./plink --vcf *.vcf --out all

And the error is:

Random number seed: 1463496194
15971 MB RAM detected; reserving 2047 MB for main workspace.
Error: Multiple instances of '_' in sample ID.
If you do not want '_' to be treated as a FID/IID delimiter, use --double-id or
--const-fid to choose a different method of converting VCF sample IDs to PLINK
IDs, or --id-delim to change the FID/IID delimiter.

I should merge all the VCF in a one file ?

ADD REPLY
0
Entering edit mode

Would be useful if you post the commands and errors.

ADD REPLY
0
Entering edit mode

Sorry ! :) This the command line that I tried to use and the error: ./plink --noweb --vcf file1.vcf --recode --out plink1

* Unused command line option: --vcf * Unused command line option: file1.vcf ERROR: Problem parsing the command line arguments.

ADD REPLY
1
Entering edit mode
5.2 years ago
zx8754 11k

As the error suggests plink is treating "_" in sample IDs as a delimiter, the solutions is provided at GitHub issue #21

plink --noweb --const-fid 0 --vcf myFile1.vcf --recode --out myPlinkFile

"--const-fid 0" is probably the simplest way; it causes all family IDs to be set to "0", and individual IDs to be set to the ID in the VCF file. (The default behavior is to treat '_' as a delimiter between the FID and IID; this obviously has a problem with multiple underscores.)

I will modify the error message to suggest --const-fid as a workaround.
-- Christopher Chang


Other relevant flags from the manual for vcf inputs:

VCF files just contain sample IDs, instead of the distinct family and within-family IDs tracked by PLINK. We offer three ways to convert these IDs:

  • --double-id causes both family and within-family IDs to be set to the sample ID.
  • --const-fid converts sample IDs to within-family IDs while setting all family IDs to a single value (default '0').
  • --id-delim causes sample IDs to be parsed as [FID][delimiter][IID]; the default delimiter is '_'. If any sample ID does not contain exactly one instance of the delimiter, an error is normally reported; however, if you have simultaneously specified --double-id or --const-fid, PLINK will fall back on that approach to handle zero-delimiter IDs.

If none of these three flags is present, the loader defaults to --double-id + --id-delim

ADD COMMENT

Login before adding your answer.

Traffic: 1467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6