Question: Pgdspider - What Is The Format Of The "Population Definition" File - Vcf To Bayescan ?
gravatar for Francois Olivier Hébert
8.7 years ago by
Francois Olivier Hébert280 wrote:

Hi everybody !

It's a simple question, but I can't find any information on that either on the Internet or in the PDF manual for PGDSpider. I'm trying to convert a VCF file into a Bayscan input file with PGD spider. The .VCF file obtained with SAMTOOLS contains the SNP information on 24 individuals : 12 in a population and 12 in another population. When I edit the SPID file in PGDSpider just before launching the conversion task, I'm asked if I want to include a file with "population definitions". Then, I have to select an input file that contains these population definitions. Since I want a Baysecan input file, I need this population information : the Bayescan file is supposed to contain information on the SNP count PER POPULATION. The problem is that I don't know what this population definitions file is supposed to look like and there is no example in the "example" folder given with the program. PGDSpider has to know which samples in the VCF file are in which population.

I have tried several input formats, but they all got rejected. Can somebody help me ? I thought of writing a python script to parse the VCF file and create the bayescan input file myself, but it would be a lot faster and easier to use PGDSpider.

Thanks for any help ! I appreciate,

Cheers !

vcf conversion • 7.8k views
ADD COMMENTlink modified 7.4 years ago by daniel.croll0 • written 8.7 years ago by Francois Olivier Hébert280
gravatar for heidi.lischer
7.5 years ago by
heidi.lischer10 wrote:


The "population definition" file contais the definition of which individual belongs to which population. It is a simple file with all individual names in the first column and the corresponding population names in the second column (columns are whitespace separated):

Ind_1  pop1
Ind_2  pop1
Ind_3  pop2
Ind_4  pop4
Ind_5  pop2 

A short description of the file can be found in the PGDSpider manual under the vcf format (Special PGDSpider input/output questions).

Cheers Heidi

ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by heidi.lischer10
gravatar for rodriguesmonic
8.5 years ago by
rodriguesmonic0 wrote:

Hello Francois,

I am having the same problem, I have a vcf file that I want to covert to a Bayescan input file, did you found out what are the population definitions for the PGDSpider?

Many thanks


ADD COMMENTlink written 8.5 years ago by rodriguesmonic0

No I haven't found any direct solution to that problem with PGD Spider. Instead of trying to fix it, since I didn't have any answer, I created my own custom Python scripts to parse the VCF file, produce a genotype matrix file and then parse this genotype matrix file to create the input file for Bayescan.

The only problem is that my scripts are extremely custom, i.e adapted to my file names and population names. If you are completely stuck with this, you can quickly tell me what kind of data you have, how many populations and maybe give me the header of your VCF file (including the columns that contain the names of your SAM files, i.e all the lines that start with "##" and the line that starts with "#CHROM").

ADD REPLYlink written 8.5 years ago by Francois Olivier Hébert280
gravatar for daniel.croll
7.4 years ago by
daniel.croll0 wrote:

I've just encountered the same problem. Converting from VCF to other formats ignored the population definition file using PGDspider 2.0.4. Surprisingly, the only conversion that worked was converting first from VCF to the PGD format and then converting from PGD to any other format. This way, the population definition file was actually properly used. Could this be a bug?

Cheers, Daniel

ADD COMMENTlink written 7.4 years ago by daniel.croll0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2157 users visited in the last hour