Question

What Ngs Read Simulators Are Available For Paired-End Data?

22

Entering edit mode

13.7 years ago

Aaronquinlan 12k

Hi all, I need to create simulated paired-end sequence data with fixed read-lengths on each end (e.g., 75mers on each end of a 500bp DNA fragment, a la Illumina). Does anyone know of a reliable simulator that can generate paired-end sequences to a requested depth, with a requested insert size/variance and error rate, for a requested genome in a FASTA file? The output would preferably be two FASTQ files, one for each end.

I can write my own, but do not want to re-invent this boring (though useful) wheel. Any clues?

next-gen sequencing fastq simulation paired • 21k views

ADD COMMENT • link updated 17 months ago by Ram 43k • written 13.7 years ago by Aaronquinlan 12k

0

Entering edit mode

See also the following thread discussing read simulation with quality scores: http://bit.ly/kNePbA

ADD REPLY • link 13.0 years ago by Botond Sipos ★ 1.7k

Ram · Answer 1 · 2010-08-19

samtools wgsim does most of what you request:

Usage:   wgsim [options] <in.ref.fa> <out.read1.fq> <out.read2.fq>

Options: -e FLOAT      base error rate [0.020]
         -d INT        outer distance between the two ends [500]
         -s INT        standard deviation [50]
         -N INT        number of read pairs [1000000]
         -1 INT        length of the first read [70]
         -2 INT        length of the second read [70]
         -r FLOAT      rate of mutations [0.0010]
         -R FLOAT      fraction of indels [0.10]
         -X FLOAT      probability an indel is extended [0.30]
         -c            generate reads in color space (SOLiD reads)
         -C            show mismatch info in comment rather than read name
         -h            haplotype mode

Note: For SOLiD reads, the first read is F3 and the second is R3.

score 8 · Answer 2 · 2010-08-20

8

Entering edit mode

13.7 years ago

Istvan Albert 100k

MetaSim may be a good option. It has platform specific error modeling and that makes it suited for generating realistic input data rather than "perfectly" random reads.

ADD COMMENT • link 13.7 years ago by Istvan Albert 100k

0

Entering edit mode

another solid choice, thank you.

ADD REPLY • link 13.7 years ago by Aaronquinlan 12k

Ram · Answer 3 · 2010-10-25

You can also try dwgsim. This is a fork of the SAMtools wgsim and its creator is Nils Homer.

Usage:   dwgsim [options] <in.ref.fa> <out.bwa.read1.fq> <out.bwa.read2.fq> <out.bfast.fq>

Options: -e FLOAT      base error rate [0.020]
         -E FILE       base/color error rate file
         -d INT        outer distance between the two ends [500]
         -s INT        standard deviation [50]
         -N INT        number of read pairs [1000000]
         -1 INT        length of the first read [70]
         -2 INT        length of the second read [70]
         -r FLOAT      rate of mutations [0.0010]
         -R FLOAT      fraction of indels [0.10]
         -X FLOAT      probability an indel is extended [0.30]
         -n INT        maximum number of Ns allowed in a given read[0]
         -c            generate reads in color space (SOLiD reads)
         -h            haplotype mode

Ram · Answer 4 · 2015-07-21

2

Entering edit mode

8.8 years ago

User 59 13k

pIRS: Profile-based Illumina pair-end reads simulator.

Or ART

Or simNGS

There's more on this OmicsTools page.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 8.8 years ago by User 59 13k

score 1 · Answer 5 · 2010-10-25

Note the difference between Illumina's paired ends (just reading from each end of a clone), and circularized clones (mate pairs), which give longer inserts, but different directions - and probably more artifacts like chimerae.

(BTW, I've written a simulator for 454 data (flowsim), feel fee to contact me if you're interested in seeing this extended to paired end - or rather, mate paired - sequences.)

score 0 · Answer 6 · 2015-07-01

0

Entering edit mode

8.8 years ago

sacha ★ 2.4k

I don't not understand how you set the depth with wgsim ?

ADD COMMENT • link 8.8 years ago by sacha ★ 2.4k

0

Entering edit mode

via read length, number of reads and the length of the input sequence?

ADD REPLY • link 8.8 years ago by Aerval ▴ 290

Ram · Answer 7 · 2015-07-21

0

Entering edit mode

8.8 years ago

Brian Bushnell 20k

RandomReads, in the BBMap package, supports paired-ends. For example:

randomreads.sh ref=ref.fa out=reads.fq paired interleaved reads=100k length=150 mininsert=200 maxinsert=400 gaussian

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 8.8 years ago by Brian Bushnell 20k

3

Entering edit mode

I have started to have the feeling that everything is implemented in the BBMap package :-)

ADD REPLY • link updated 17 months ago by Ram 43k • written 8.8 years ago by Istvan Albert 100k

0

Entering edit mode

That's my ultimate goal... haven't quite reached it yet!

ADD REPLY • link updated 17 months ago by Ram 43k • written 8.8 years ago by Brian Bushnell 20k

0

Entering edit mode

Hi Brian! Is it possible to generate reads in specific intervals? WES-like read simulation?

ADD REPLY • link 6.8 years ago by user230613 ▴ 360

0

Entering edit mode

No, unfortunately not. You'd have to use something like bedtools to pull out the exome fasta using the genome fasta and the bait coordinates, and then use RandomReads on the result. I don't currently have anything to parse bed, but that does seem like a good addition to RandomReads.

ADD REPLY • link 6.8 years ago by Brian Bushnell 20k

0

Entering edit mode

Thank you for the fast answer. I'll try the bedtools pre-step. Another issue.. I've realised that in PE mode, the names of the output reads in the two files are not paired, is there any option for this?

ADD REPLY • link 6.8 years ago by user230613 ▴ 360

1

Entering edit mode

Yes - add the flag "illuminanames".

ADD REPLY • link 6.8 years ago by Brian Bushnell 20k

0

Entering edit mode

Is that possible to generate RNA-seq reads from BBmap?

ADD REPLY • link 5.8 years ago by k.kathirvel93 ▴ 300