Question: How to indicate population structure in EggLib?
0
gravatar for remi.b.md
2.1 years ago by
remi.b.md60
remi.b.md60 wrote:

In Python, I am using EggLib. I am trying to calculate Jost's D value per SNP found in a VCF file.

Data

Data is here in VCF format. The data set is small, there are 2 populations, 100 individuals per population and 6 SNPs (all on chromosome 1).

Each individual is named Pp.Ii, where p is the population index it belongs to and i is the individual index.

Issue

I fail to understand the format of the Structure object explained here on the EggLib website. I suppose my misunderstanding is due to the fact that the example refers to a FASTA file while I am using a VCF file and I fail to extend the logic of the Structure object to my case.

Code

### Read the vcf file ###
vcf = egglib.io.VcfParser("MyData.vcf") 

### Create the `Structure` object ###
# Dictionary for a given cluster. There is only one cluster.
dcluster = {}            
# Loop through each population 
for popIndex in [0,1]:  
    # dictionnary for a given population. There are two populations
    dpop = {}            
    # Loop through each individual
    for IndIndex in range(popIndex * 100,(popIndex + 1) * 100):     
            # A single list to define an individual
        dpop[IndIndex] = [IndIndex*2, IndIndex*2 + 1]
    dcluster[popIndex] = dpop

Structure = {0: dcluster}

### Define the population structure ###
egglib.stats.make_structure(Structure, None) 

### Configurate the 'ComputeStats' object ###
cs = egglib.stats.ComputeStats()
cs.configure(only_diallelic=False)
cs.add_stats('Dj') # Jost's D

### Isolate a SNP ###
vcf.next()
site = egglib.stats.site_from_vcf(vcf)

### Calculate Jost's D ###
cs.process_site(site, struct=Structure)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/egglib/stats/_cstats.py", line 431, in process_site
    self._frq.process_site(site, struct=struct)
  File "/Library/Python/2.7/site-packages/egglib/stats/_freq.py", line 159, in process_site
    if sum(struct) != site._obj.get_ning(): raise ValueError, 'invalid structure (sample size is required to match)'
ValueError: invalid structure (sample size is required to match)
snp egglib python vcf • 715 views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by remi.b.md60
1
gravatar for remi.b.md
2.1 years ago by
remi.b.md60
remi.b.md60 wrote:

There are two errors

First error

The function make_structure returns the Structure object but does not save it within stats. You therefore have to save this output.

Structure = egglib.stats.make_structure(struct, None)

Second error

The Structure object must designate haploids. Therefore, create the dictionary as

dcluster = {}            
for popIndex in [0,1]:  
    dpop = {}            
    for IndIndex in range(popIndex * 100,(popIndex + 1) * 100):     
        dpop[IndIndex] = [IndIndex]
    dcluster[popIndex] = dpop

struct = {0: dcluster}
ADD COMMENTlink written 2.1 years ago by remi.b.md60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 945 users visited in the last hour