Question: Ngs Dna Read Simulator With Quality Scores Available?
gravatar for Travis
6.3 years ago by
Travis2.7k wrote:

Hi all,

I am looking to generate some simulated Illumina 100bp paired end DNA reads.

I have tried a couple of options so far including SAMTools wgsim and Bfast's bgeneratereads, however neither of them simulate quality scores. Each base gets assigned either a symbol or the same number.

Is anyone aware of software that includes quality scores in its simulation?

Thanks in advance!

ADD COMMENTlink modified 6.3 years ago by Mitch Bekritsky1.0k • written 6.3 years ago by Travis2.7k
gravatar for 2184687-1231-83-
6.3 years ago by
2184687-1231-83-4.8k wrote:

You should definitely try simNGS:

It also simulates the Illumina library preparations. For the simulation, it uses real intensity files from an existing Illumina machine. The latest version has an example 101bp run from an Illumina HiSeq machine with TruSeq chemistry at Sanger.

ADD COMMENTlink written 6.3 years ago by 2184687-1231-83-4.8k

The simNGS package can simulate paired-end library construction (with adjustable mean and std dev) and sample preparation errors (substitutions and indels) as well. The sample prep error rate is properly incorporated in the simulated quality scores.

ADD REPLYlink written 6.3 years ago by Botond Sipos1.5k
gravatar for Nilshomer
6.3 years ago by
Nilshomer100 wrote:

You can also try the dwgsim program in the DNAA package ( This also has two programs to assess the sensitivity/specificity of your mapping (dwgsim_eval) and pileup (dwgsim_pileup_eval).

ADD COMMENTlink written 6.3 years ago by Nilshomer100

Does this do qulaity scores Nils?

ADD REPLYlink written 6.3 years ago by Travis2.7k
gravatar for Mitch Bekritsky
6.3 years ago by
Mitch Bekritsky1.0k
London, England
Mitch Bekritsky1.0k wrote:

I've used MAQ simulate to get reads with simulated quality scores before. Instead of using real intensity files as simNGS does, it generates a transition matrix from fastq file(s) that it uses to simulate read quality. I like this option because it allows me to use reads that were previously obtained on the same machine, which I feel gives me quality scores that are a good representation of what I can expect in the future from the same machine or sequencing core (The sequencing facility I get my data from doesn't keep raw intensity files for more than 2 weeks).

It also creates paired-end reads with a insert size mean and std dev you can tweak, and has some other options for substitution and indel frequency.

ADD COMMENTlink written 6.3 years ago by Mitch Bekritsky1.0k
gravatar for Benm
6.3 years ago by
Benm710 wrote:

I wrote a program before for reads simulation of NGS, including Solexa/Illumina FASTQ format, SOLiD/ABi color space format, 454/Roche fna/qual format, and it supports Paired-ends, Mate Pairs, or reads with adapter/primers/cloning vector, and enzyme digestion site, etc. and it will generate diversity/mutation including SNPs, Indels, SVs. But it still not released yet. But refer to your question, I don't think you need to focus on simulating quality scores for your simulated Illumina reads. However, I think the random quality is OK, but actually, the right most 5~20bp of the 3' of the reads will be lower, so you can follow this subroutine(PERL),

#Usage: generate_qual(\$quality, $Reads_length);
sub generate_qual
    my ($Reads_length,$quality) = @_;
    for (my $i=0;$i<$Reads_length-5;$i++)
        $$quality .= chr(int(rand(36))+74);
    for (my $i=$Reads_length-5;$i<$Reads_length;$i++)
        $quality .= chr(int(rand(46))+64);

Here, $Reads_length=100 as your requirement.

ADD COMMENTlink written 6.3 years ago by Benm710

I uploaded my program to SourceForge:

ADD REPLYlink written 6.2 years ago by Benm710
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1262 users visited in the last hour