How to indicate population structure in EggLib?
1
0
Entering edit mode
7.1 years ago
remi.b.md ▴ 60

In Python, I am using EggLib. I am trying to calculate Jost's D value per SNP found in a VCF file.

Data

Data is here in VCF format. The data set is small, there are 2 populations, 100 individuals per population and 6 SNPs (all on chromosome 1).

Each individual is named Pp.Ii, where p is the population index it belongs to and i is the individual index.

Issue

I fail to understand the format of the Structure object explained here on the EggLib website. I suppose my misunderstanding is due to the fact that the example refers to a FASTA file while I am using a VCF file and I fail to extend the logic of the Structure object to my case.

Code

### Read the vcf file ###
vcf = egglib.io.VcfParser("MyData.vcf") 

### Create the `Structure` object ###
# Dictionary for a given cluster. There is only one cluster.
dcluster = {}            
# Loop through each population 
for popIndex in [0,1]:  
    # dictionnary for a given population. There are two populations
    dpop = {}            
    # Loop through each individual
    for IndIndex in range(popIndex * 100,(popIndex + 1) * 100):     
            # A single list to define an individual
        dpop[IndIndex] = [IndIndex*2, IndIndex*2 + 1]
    dcluster[popIndex] = dpop

Structure = {0: dcluster}

### Define the population structure ###
egglib.stats.make_structure(Structure, None) 

### Configurate the 'ComputeStats' object ###
cs = egglib.stats.ComputeStats()
cs.configure(only_diallelic=False)
cs.add_stats('Dj') # Jost's D

### Isolate a SNP ###
vcf.next()
site = egglib.stats.site_from_vcf(vcf)

### Calculate Jost's D ###
cs.process_site(site, struct=Structure)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/egglib/stats/_cstats.py", line 431, in process_site
    self._frq.process_site(site, struct=struct)
  File "/Library/Python/2.7/site-packages/egglib/stats/_freq.py", line 159, in process_site
    if sum(struct) != site._obj.get_ning(): raise ValueError, 'invalid structure (sample size is required to match)'
ValueError: invalid structure (sample size is required to match)
SNP vcf egglib python • 1.5k views
ADD COMMENT
1
Entering edit mode
7.1 years ago
remi.b.md ▴ 60

There are two errors

First error

The function make_structure returns the Structure object but does not save it within stats. You therefore have to save this output.

Structure = egglib.stats.make_structure(struct, None)

Second error

The Structure object must designate haploids. Therefore, create the dictionary as

dcluster = {}            
for popIndex in [0,1]:  
    dpop = {}            
    for IndIndex in range(popIndex * 100,(popIndex + 1) * 100):     
        dpop[IndIndex] = [IndIndex]
    dcluster[popIndex] = dpop

struct = {0: dcluster}
ADD COMMENT

Login before adding your answer.

Traffic: 1813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6