Question: NGS reads simulation
4
gravatar for wangyi2412
2.2 years ago by
wangyi2412160
China
wangyi2412160 wrote:

Hi, everyone! 

      I am studying the performance of my algorithm, where I need simulation. I looked up the ones used by 1000genomes, but people there said it was outdated, and suggested finding a new one.

      I overheard some software called ART, but cannot find it on web. 

      I also read some similar paper which used simulation, but all of them did not point out what existing software or data sets they used.

      Beside the software or datasets to do the simulations in box, I want to know more about the details of and the principle behind the simulation.

       The settings are like this:

1 First construct the diploid of a human(only consider SNPs/indels, not including other type of variations)

2 generate templates with Gussian distributed length and coming with equal prob from the 4 strand of DNA(+/- strand of two homologous chromosomes, with the error rate similar with that of the sequencing machine like illumina hiseq 2000

3 get 100 bp reads from each template.

         The key is how to construct the diploid of a human so that it best resemble a "typical" person in a population in study. Anyone has any idea? Randomly select of a bp to be different from the ref with the prob. of the mutation rate, say 1%? But the mutation rate should be different on different regions, so how to simulate this scenario? Or to the aim of the study, as long as the simulation is not for study depending on the distribution of the variations, this could be omitted

       Thank you very much!

 

Yi   

 

 

 

sequencing simulation • 2.8k views
ADD COMMENTlink modified 2.2 years ago by Felix Francis390 • written 2.2 years ago by wangyi2412160
15
gravatar for Felix Francis
2.2 years ago by
Felix Francis390
United States/University of Delaware
Felix Francis390 wrote:

Here is a list of genetic simulation resources. I believe there are several ones that would suit your needs.

 

 

Software Resource Brief Description and Homepage
Aladyn Tools to investigate how demographic parameters, populations genetics and abiotic conditions affect the rate of adaptation 
http://www.katja-schiffers.eu/research.html
ALF A Simulation Framework for Genome Evolution 
http://www.cbrg.ethz.ch/alf
ART ART is a set of simulation tools to generate synthetic next-generation sequencing data by mimicking real sequencing process with empirical error models or quality profiles. 
http://www.niehs.nih.gov/research/resources/software/biostatistics/art/
Bayesian Serial SimCoal Bayesian Serial SimCoal, (BayeSSC) is a modification of SIMCOAL 1.0, a program written by Laurent Excoffier, John Novembre, and Stefan Schneider. 
http://www.stanford.edu/group/hadlylab/ssc/index.html
BaySICS An integral platform with a graphical interface for statistical inference based on approximate Bayesian computation. 
https://sites.google.com/site/baysicsabc/
BEERS BEERS was designed to benchmark RNA-Seq alignment algorithms and also algorithms that aim to reconstruct different isoforms and alternate splicing from RNA-Seq data 
http://cbil.upenn.edu/beers/
BOTTLENECK Bottleneck is a program for detecting recent effective population size reductions from allele data frequencies 
http://www.ensam.inra.fr/urlb/bottleneck/bottleneck.html
BottleSim BottleSim is a computer simulation program for simulating the process of population bottlenecks 
http://chkuo.name/software/bottlesim.html
CASS Protein Sequence Simulation 
http://www.wyomingbioinformatics.org/liberlesgroup/cass/
CDPOP CDPOP is a landscape genetics tool for simulating the emergence of spatial genetic structure in populations resulting from specified landscape processes governing organism movement behavior. 
http://cel.dbs.umt.edu/cdpop
Classical Genetics Simulator Web-based simulation software 
http://www.cgslab.com/
CoaSim CoaSim is a tool for simulating the coalescent process with recombination and geneconversion under various demographic models. 
http://users-birc.au.dk/mailund/coasim/index.html
cosi The cosi package is written in C and is available as a tar file. 
http://www.broadinstitute.org/~sfs/cosi/
CS-PSeq-Gen A program to simulate the evolution of protein sequences under the constraints of the information of a particular reconstructed phylogeny 
http://bioserv.rpbs.univ-paris-diderot.fr/software/cs-pseq-gen.html
DAWG An application designed to simulate the evolution of recombinant DNA sequences in continuous time 
http://scit.us/projects/dawg
Easypop EASYPOP is an individual based model intended to simulate datasets under a very broad range of conditions 
http://www.unil.ch/dee/page36926_fr.html
EggLib EggLib is a C++/Python library and program package for evolutionary genetics and genomics. 
http://egglib.sourceforge.net/
EpiSIM EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis 
https://sourceforge.net/projects/episimsimulator/files/
EvolSimulator A simulation test bed for hypotheses of genome evolution 
http://acb.qfab.org/acb/evolsim/
EvolveAGene A realistic coding sequence simulation program that separates mutation from selection and allows the user to set selection conditions 
http://bellinghamresearchinstitute.com/software/index.html
fastsimcoal A continuous-­‐time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios 
http://cmpg.unibe.ch/software/fastsimcoal/
FastSLINK Simulation of Marker and Phenotype Data in Pedigrees 
http://watson.hgen.pitt.edu/
FFPopSim C++/Python library for population genetics. 
http://webdav.tuebingen.mpg.de/ffpopsim/
FLUX SIMULATOR The Flux Simulator aims at providing a deterministic in silico reproduction of the experimental pipelines for RNA-Seq, employing a minimal set of parameters. 
http://flux.sammeth.net/simulator.html
forqs Forward-in-time simulation of Recombination, Quantitative Traits, and Selection 
https://bitbucket.org/dkessner/forqs
ForSim ForSim: A Forward Evolutionary Computer Simulation 
http://anth.la.psu.edu/research/weiss-lab/research/research
ForwSim The program given below is based on the algorithm described in Padhukasahasram et al. 2008 to simulate genetic drift in a standard Wright-Fisher process. 
http://badri-populationgeneticsimulators.blogspot.com/
FPG Forward Population Genetic simulation 
https://bio.cst.temple.edu/~hey/software/software.htm#fpg
FREGENE FREGENE is a C++ program that simulates sequence-like data over large genomic regions in large diploid populations. 
http://www.ebi.ac.uk/projects/bargen
FIGG FIGG is a genome simulation tool that uses known or theorized variation frequency, per a given fragment size and grouped by GC content across a genome to model new genomes in FASTA format while tracking applied mutations for use in analysis 
http://insilicogenome.sourceforge.net/
fwdpp A C++ template library for implementing efficient forward simulations. 
http://molpopgen.github.io/fwdpp/
GAMETES Genetic Architecture Model Emulator for Testing and Evaluating Software: Simulates complex SNP models with pure, strict epistatic interactions with n-loci. 
http://sourceforge.net/projects/gametes/?source=navbar
GASP Genometric Analysis Simulation Program. A software tool for testing and investigating methods in statistical genetics by generating samples of family data based on user specified models. 
http://research.nhgri.nih.gov/gasp/
GCTA Genome-wide Complex Trait Analysis 
http://www.complextraitgenomics.com/software/gcta/download.html
GemSIM Next generation sequencing read simulator 
http://sourceforge.net/projects/gemsim/
GeneArtisan Simulation of Markers in Case-Control Study Designs 
http://www.rannala.org/?page_id=241
GENOME A rapid coalescent-based whole genome simulator 
http://www.sph.umich.edu/csg/liang/genome/
GenomePop2 GenomePop2 is a specialization of the program GenomePop just to manage SNPs under more flexible and useful settings. If you need models with more than 2 alleles please use the GenomePop program version. 
http://webs.uvigo.es/acraaj/genomepop2.htm
GenomeSimla GenomeSIMLA is currently under development- however, we have a beta release that we are asking to be tested 
http://chgr.mc.vanderbilt.edu/genomesimla/
GENS2 Simulates interactions among two genetic and one environmental factor and also allows for epistatic interactions. 
https://sourceforge.net/projects/gensim/
GWAsimulator A rapid whole genome simulation program 
http://biostat.mc.vanderbilt.edu/wiki/main/gwasimulator
HAP-SAMPLE An association simulator for candidate regions or genome scans 
http://www.hapsample.org/
HAPGEN A simulator for the simulation of case control datasets at SNP markers 
https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html
HapSim A simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients 
http://cran.r-project.org/web/packages/hapsim/index.html
HAPSIMU A program that simulates heterogeneous populations with various known and controllable structures under the continuous migration model or the discrete model 
http://l.web.umkc.edu/liujian/
IBDsim IBDSim is a computer package for the simulation of genotypic data under general isolation by distance models. 
http://raphael.leblois.free.fr/
indel-Seq-Gen A biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies 
http://bioinfolab.unl.edu/~cstrope/isg/
Indelible A powerful and flexible simulator of biological evolution 
http://abacus.gene.ucl.ac.uk/software/indelible/
invertFREGENE InvertFREGENE is a forward-in-time simulator of inversions in population genetic data 
http://www.ebi.ac.uk/projects/bargen/
kernalPop A spatially explicit population genetic simulation engine 
http://cran.r-project.org/src/contrib/archive/kernelpop/
MaCS Markovian Coalescent Simulator 
http://www-hsc.usc.edu/~garykche/
Mason A package for the simulation of nucleotide data. 
http://www.seqan.de/projects/mason/
mbs modifying Hudson's ms software to generate samples of DNA sequences with a biallelic site under selection 
http://www.sendou.soken.ac.jp/esb/innan/innanlab/software.html
Mendel's Accountant Mendel's Accountant (MENDEL) is an advanced numerical simulation program for modeling genetic change over time and was developed collaboratively by Sanford, Baumgardner, Brewer, Gibson and ReMine 
http://mendelsaccount.sourceforge.net/
MetaSim A tool to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets 
http://ab.inf.uni-tuebingen.de/software/metasim/
mlcoalsim Multilocus Coalescent Simulations 
http://code.google.com/p/mlcoalsim-v1/
ms The purpose of this program is to allow one to investigate the statistical properties of such samples, to evaluate estimators or statistical tests, and generally to aid in the interpretation of polymorphism data sets. 
http://home.uchicago.edu/~rhudson1/source/mksamples.html
msHOT The purpose of this program is to allow one to investigate the statistical properties of such samples, to evaluate estimators or statistical tests, and generally to aid in the interpretation of polymorphism data sets. 
http://home.uchicago.edu/~rhudson1/
msms A coalescent Simlation tool with selection. 
http://www.mabs.at/ewing/msms/index.shtml
MySSP A program for the simulation of DNA sequence evolution across a phylogenetic tree 
http://www.rosenberglab.net/software.php
Nemo A forward-time, individual-based, genetically explicit, and stochastic simulation program designed to study the evolution of genetic markers, life history traits, and phenotypic traits in a flexible (meta-)population framework. 
http://nemo2.sourceforge.net/
NetRecodon Coalescent simulation of coding DNA sequences with recombination (inter and intracodon), migration and demography 
http://code.google.com/p/netrecodon/
PEDAGOG Software for simulating eco-evolutionary population dynamics 
https://bcrc.bio.umass.edu/pedigreesoftware/node/5
phenosim A tool to add phenotypes to simulated genotypes 
http://evoplant.uni-hohenheim.de/doku.php?id=software:software
PhyloSim An R package for the Monte Carlo simulation of sequence evolution 
http://www.ebi.ac.uk/goldman-srv/phylosim/
pIRS Profile-based Illumina pair-end reads simulator 
https://code.google.com/p/pirs/
ProteinEvolver Simulation of protein evolution along phylogenies under structure-based substitution models 
http://code.google.com/p/proteinevolver/
QMSim QTL and Marker Simulator 
http://www.aps.uoguelph.ca/~msargol/qmsim/
quantiNEMO An individual-based program for the analysis of quantitative traits with explicit genetic architecture potentially under selection in a structured population 
http://www2.unil.ch/popgen/softwares/quantinemo/
RECOAL Simulates new haplotype data from a reference population of haplotypes. 
ftp://popgen.usc.edu/
Recodon Coalescent simulation of coding DNA sequences with recombination, migration and demography 
http://code.google.com/p/recodon/
rlsim A package for simulating RNA-seq library preparation with parameter estimation 
http://bit.ly/rlsim-git
Rmetasim Rmetasim is a front-end for the metasim engine that is implemented as a package that runs in the statistical computing environment R 
http://cran.r-project.org/web/packages/rmetasim/index.html
RNA Seq Simulator RSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets. 
http://useq.sourceforge.net/cmdlnmenus.html#rnaseqsimulator
Rose Random model of sequence evolution 
http://bibiserv.techfak.uni-bielefeld.de/rose/
scrm A coalescent simulator optimized for long sequences and large samples. 
https://scrm.github.io/
SelSim SelSim is a program for Monte Carlo simulation of DNA polymorphism data for a recom- bining region within which a single bi-allelic site has experienced natural selection 
http://www.well.ox.ac.uk/~spencer/selsim/
Seq-Gen An application for the Monte Carlo simulation of molecular sequence evolution along phylogenetic trees. 
http://tree.bio.ed.ac.uk/software/seqgen/
SEQPower Statistical power analysis for sequence-based association studies 
http://bioinformatics.org/spower/
SeqSIMLA SeqSIMLA can simulate sequence data with user-specified disease and quantitative trait models. Family or unrelated case-control data can be simulated. 
http://seqsimla.sourceforge.net/
Serial NetEvolve A flexible utility for generating serially-sampled sequences along a tree or recombinant network 
http://biorg.cis.fiu.edu/sne/
SFS_CODE SFS_CODE can perform forward population genetic simulations under a general Wright-Fisher model with arbitrary migration, demographic, selective, and mutational effects. 
http://sfscode.sourceforge.net/sfs_code/index/index.html
SIBSIM Quantitative phenotype simulation in extended pedigrees 
http://sourceforge.net/projects/sibsim/
SimAdapt A spatially explicit, individual-based, forward-time, landscape-genetic simulation model combined with a landscape cellular automaton. 
http://www.openabm.org/model/3137
SIMCOAL2 A coalescent program for the simulation of complex recombination patterns over large genomic regions under various demographic models 
http://cmpg.unibe.ch/software/simcoal2/
SimCopy An R package simulating the evolution of copy number profiles along a tree. 
http://bit.ly/simcopy
SIMLA SIMLA is a SIMuLAtion program that generates data sets of families for use in Linkage and Association studies. 
http://www.chg.duke.edu/research/simla.html
SimPed A Simulation Program to Generate Haplotype and Genotype Data for Pedigree Structures 
http://bioinformatics.org/simped/
Simprot A program to simulate protein evolution by substitution, insertion and deletion 
http://www.uhnresearch.ca/labs/tillier/software.htm#3
SimRare Rare variant simulation and analysis tool 
http://code.google.com/p/simrare/
simuGWAS A forward-time simulator that simulates realistic samples for genome-wide association studies. 
http://simupop.sourceforge.net/cookbook/simugwas
simuPOP simuPOP is a general-purpose individual-based forward-time population genetics simulation environment. 
http://simupop.sourceforge.net/
SISSI A software tool to generate data of related sequences along a given phylogeny, taking into account user defined system of neighbourhoods and instantaneous rate matrices. 
http://www.cibiv.at/software/sissi/
SMARTPOP Simulating Mating Alliance as a Reproductive Tactic for Populations 
http://smartpop.sourceforge.net/
SNPsim Coalescent simulation of hotspot recombination 
http://code.google.com/p/phylosoftware/
SPIP SPIP simulates the transmission of genes from parents to offspring in a population having demographic structure defined by the user 
http://swfsc.noaa.gov/textblock.aspx?division=fed&id=3434
Splatche Spatial and Temporal Coalescences in Heterogeneous Environment 
http://www.splatche.com/
srv Simulator of Rare Varaints (srv) is a simulator for the simulation of the introduction and evolution of (rare) genetic variants. 
http://simupop.sourceforge.net/cookbook/simurarevariants
SUP SLINK/FastSLINK utility program 
http://mlemire.freeshell.org/software.html
TreesimJ A flexible, forward-time population genetic simulator 
http://code.google.com/p/treesimj/
Vortex VORTEX is an individual-based simulation model for population viability analysis (PVA). 
http://www.vortex9.org/vortex.html
Wessim Whole Exome Sequencing SIMulator 
http://sak042.github.io/wessim/
ADD COMMENTlink written 2.2 years ago by Felix Francis390
1

Whoa! That's one gigantic list! Goes to show the richness in just any subdomain of bioinformatics.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Istvan Albert ♦♦ 70k

BBMap's RandomReads: Generates single-ended or paired Illumina reads, or PacBio reads, from a genome. Also has a metagenome mode.

ADD REPLYlink written 7 weeks ago by Brian Bushnell10k
3
gravatar for rtliu
2.2 years ago by
rtliu1.9k
New Zealand
rtliu1.9k wrote:

Try read simulators in omictools.com  including ART, wgsim etc

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by rtliu1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 791 users visited in the last hour