Question: Simulation of single-end reads from reference genome
1
gravatar for newscient
2.9 years ago by
newscient20
European Union
newscient20 wrote:

Hi, I am trying to simulate single-end sequencing reads from a reference genome but uniformly distributed. I have the desired read length and number of reads, I could easily use DWMSIM (https://github.com/nh13/DWGSIM) to generate reads randomly, but i am looking for a tool that would make things easier regarding the wanted uniform coverage of the genome?

Thanks in advance!

simulation reads • 1.0k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by newscient20

what do you mean by "uniform" coverage?

ADD REPLYlink written 2.9 years ago by Gabriel R.2.8k

Sampling reads using a uniform distribution sounds better !?

ADD REPLYlink written 2.9 years ago by newscient20
1

ok just checking to make sure :-) Because using a uniform dist. the coverage at any given site will be Poisson distributed.

I coded gargammel which is a simulator for ancient DNA:

https://grenaud.github.io/gargammel/

it can be used for modern DNA as well though, just remove the ancient DNA idiosyncrasies. It uses ART to simulate seq errors. I know that ART can simulate different coverage. I do not know if ART can add adapters if the fragment length is less than the read length. gargammel does this though. gargammel also allows you to specify desired coverage.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Gabriel R.2.8k
1

Do you want the probably distribution of read sampling at each position to be a uniform distribution, or do you want uniform coverage across your genome? If you want uniform coverage, then you can't use a random generator. If you want uniform coverage of 50x using 100bp reads then you'll have to generate 1 read every 2 bases; uniform 100x coverage of 100bp reads requires simulating 1 read at each genomic position; and so on.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by d-cameron2.2k

You can also try randomreads.sh from BBMap suite. Check the in-line help for various options.

ADD REPLYlink written 2.9 years ago by genomax91k

Is the an non-random equivalent? OP appears to want perfectly uniform coverage and to get that you can't use a random sampling strategy.

ADD REPLYlink written 2.9 years ago by d-cameron2.2k

randomreads.sh is the name of the program. It has many options to generate simulated data.

ADD REPLYlink written 2.9 years ago by genomax91k
1
gravatar for Sej Modha
2.9 years ago by
Sej Modha4.7k
Glasgow, UK
Sej Modha4.7k wrote:

Reads Simulation and Read Simulator

ADD COMMENTlink written 2.9 years ago by Sej Modha4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1896 users visited in the last hour