Question: Which simulator to use for generating fastq reads from a population of haploids
4
gravatar for Joseph Hughes
4.3 years ago by
Joseph Hughes2.7k
Scotland, UK
Joseph Hughes2.7k wrote:

Hi,

I want to generate reads from a set of different haplotype sequences found at different frequencies within a sample. I was wondering whether any Biostar users had any experience of doing this and which tools they used. I would prefer using an existing tool but there are quite few out there (I have compiled some in the table below). I am wondering which one will suit my purpose best. 

Any advice is very welcome - thanks.

Joseph

 

Name Reference Single-end Paired-end insert size between reads Quality score customise read length coverage bias Phred score from existing FASTQ Simulated sequencing errors insertion-deletion errors source platforms
ArtificialFastqGenerator doi: 10.1371/journal.pone.0049110   yes yes yes yes yes based on GC yes yes   https://sourceforge.net/projects/artfast​qgen/ illumina
ART doi: 10.1093/bioinformatics/btr708 yes yes   no yes   yes yes yes   Roche 454, Illumina Solexa, Applied Biosystem SOLiD
WgSim doi: 10.1093/bioinformatics/btp352       dummy quality scores       uniform distribution error yes by simulating INDEL polymorphisms    
Mason Holtgrewe M (2010) Mason - a read simulator for second generation sequencing data. Technical report, FU Berlin. yes yes   yes     no yes random   454, Illumina, Sanger
SimSeq Available: https://github.com/jstjohn/SimSeq. Accessed 2012 October 10th. yes yes yes   <100bp           Illumina 
pIRS doi: 10.1093/bioinformatics/bts187   yes   empirical model based on read cycle   yes based on GC   empirical model based on read cycle yes with additional tool    
Stampy doi: 10.1101/gr.111120.110                      
MetaSim doi: 10.1371/journal.pone.0003373 yes     no              
FlowSim doi: 10.1093/bioinformatics/btq365 yes     yes           yes 454
simNGS doi: http://www.ebi.ac.uk/goldman-srv/simNGS/                      
Grinder doi: 10.1093/nar/gks251                      
ADD COMMENTlink modified 4.1 years ago • written 4.3 years ago by Joseph Hughes2.7k
0
gravatar for Joseph Hughes
4.3 years ago by
Joseph Hughes2.7k
Scotland, UK
Joseph Hughes2.7k wrote:

I started off using ArtificialFastqGenerator but it first ran out of memory and then when I provided more memory it took a very long time to run.

In the end, I found that ART was the most useful program. At least I could get it to work without too much difficulty. I generated fastq reads from a number of different references at different depths of coverages and then mixed different sets together.

UPDATE: The following blog post is related and very useful 

http://scottmyourstone.blogspot.co.uk/2013/10/read-simulators-review-with-emphasis-on.html

 

 

 

ADD COMMENTlink modified 3.6 years ago • written 4.3 years ago by Joseph Hughes2.7k

Do not you think that ArtificialFastqGenerator excel over ART ?! How long it take you to run ArtificialFastqGenerator ? what was the specification of the computer that run it? 

ADD REPLYlink written 3.9 years ago by Medhat8.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 716 users visited in the last hour