Tool:Create realistic VCF genomic data in just a few lines of Python code
0
1
Entering edit mode
8 months ago
Rémy ▴ 10

Hi all!

I recently developed a Python 3+ library called HaploDynamics for simulating realistic variant call (VCF) data.

The library is designed to create synthetic datasets for benchmarking and pipeline testing. It does not require any input files, and users can specify the genetic architecture of the dataset they want to generate.

https://github.com/remytuyeras/HaploDynamics

For example, HaploDynamics can generate VCF data for different populations, with different levels of LD and rare mutations. The following script generates a VCF file containing simulated diploid genotypes for a population of 1000 individuals with an allele frequency spectrum similar to African/South Asian populations. The dataset is composed of a sequence of LD-blocks of length 20kb, 5kb, 20kb, 35kb, 30kb, and 15kb.

import HaploDynamics.HaploDX as hdx

simulated_data = hdx.genmatrix([20,5,20,35,30,15],strength=1,population=0.1,Npop=1000)
hdx.create_vcfgz("genomic-data.simulation.v1",*simulated_data)

The library is still under development, but it is already fully functional. In the future, I plan to add more features such as the ability to generate large collections of files using multiprocessing.

Overall, HaploDynamics can help you save time and bandwidth, improve the robustness of your applications, control storage space, and avoid red tape to get your data.

I hope you find the library useful!

genomic linkage python vcf tool disequilibrium • 572 views
ADD COMMENT

Login before adding your answer.

Traffic: 1508 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6