Question

Which simulator to use for generating fastq reads from a population of haploids

5

Entering edit mode

9.3 years ago

Joseph Hughes ★ 3.0k

Hi,

I want to generate reads from a set of different haplotype sequences found at different frequencies within a sample. I was wondering whether any Biostar users had any experience of doing this and which tools they used. I would prefer using an existing tool but there are quite few out there (I have compiled some in the table below). I am wondering which one will suit my purpose best.

Any advice is very welcome - thanks.

Joseph

Name                         Reference                                         Single-     Paired-     insert size         Quality           customise     coverage bias     Phred score from     Simulated sequencing     insertion-deletion     source                                            platforms
                                                                               end         end         between reads       score             read length                     existing FASTQ       errors                   errors

ArtificialFastqGenerator     doi: 10.1371/journal.pone.0049110                 -           yes         yes                 yes               yes           yes based         yes                  yes                      -                      https://sourceforge.net/projects/artfastqgen/     illumina
                                                                                                                                                           on GC
ART                          doi: 10.1093/bioinformatics/btr708                yes         yes         -                   no                yes           -                 yes                  yes                      yes                    -                                                 Roche 454, Illumina Solexa, Applied Biosystem SOLiD
WgSim                        doi: 10.1093/bioinformatics/btp352                -           -           -                   dummy             -             -                 -                    uniform distribution     yes by simulating      -                                                 -
                                                                                                                           quality                                                                error                    INDEL polymorphisms
                                                                                                                           scores
Mason                        Holtgrewe M (2010) Mason - a read                 yes         yes         -                   yes               -             -                 no                   yes                      random                 -                                                 454, Illumina, Sanger
                             simulator for second generation
                             sequencing data. Technical report,
                             FU Berlin.
SimSeq                       Available: https://github.com/jstjohn/SimSeq.     yes         yes         yes                 -                 <100bp        -                 -                    -                        -                      -                                                 Illumina 
                             Accessed 2012 October 10th.
pIRS                         doi: 10.1093/bioinformatics/bts187                -           yes         -                   empirical model   -             yes based         -                    empirical model          yes with               -                                                 -
                                                                                                                           based on                        on GC                                  based on                 additional
                                                                                                                           read cycle                                                             read cycle               tool
Stampy                       doi: 10.1101/gr.111120.110                        -           -           -                   -                 -             -                 -                    -                        -                      -                                                 -    
MetaSim                      doi: 10.1371/journal.pone.0003373                 yes         -           -                   no                -             -                 -                    -                        -                      -                                                 -
FlowSim                      doi: 10.1093/bioinformatics/btq365                yes         -           -                   yes               -             -                 -                    -                        -                      yes                                               454
simNGS                       http://www.ebi.ac.uk/goldman-srv/simNGS/          -           -           -                   -                 -             -                 -                    -                        -                      -                                                 -
Grinder                      doi: 10.1093/nar/gks251                           -           -           -                   -                 -             -                 -                    -                        -                      -                                                 -

haploid simulator population fastq • 7.4k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Joseph Hughes ★ 3.0k

Ram · Answer 1 · 2014-12-19

0

Entering edit mode

9.3 years ago

Joseph Hughes ★ 3.0k

I started off using ArtificialFastqGenerator but it first ran out of memory and then when I provided more memory it took a very long time to run.

In the end, I found that ART was the most useful program. At least I could get it to work without too much difficulty. I generated fastq reads from a number of different references at different depths of coverages and then mixed different sets together.

UPDATE: The following blog post is related and very useful - http://scottmyourstone.blogspot.co.uk/2013/10/read-simulators-review-with-emphasis-on.html

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Joseph Hughes ★ 3.0k

0

Entering edit mode

Do not you think that ArtificialFastqGenerator excel over ART?! How long it take you to run ArtificialFastqGenerator? What was the specification of the computer that run it?

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 8.9 years ago by Medhat 9.7k