Question: Simulation of single-end reads from reference genome
1
gravatar for newscient
15 months ago by
newscient20
European Union
newscient20 wrote:

Hi, I am trying to simulate single-end sequencing reads from a reference genome but uniformly distributed. I have the desired read length and number of reads, I could easily use DWMSIM (https://github.com/nh13/DWGSIM) to generate reads randomly, but i am looking for a tool that would make things easier regarding the wanted uniform coverage of the genome?

Thanks in advance!

simulation reads • 582 views
ADD COMMENTlink modified 15 months ago • written 15 months ago by newscient20

what do you mean by "uniform" coverage?

ADD REPLYlink written 15 months ago by Gabriel R.2.6k

Sampling reads using a uniform distribution sounds better !?

ADD REPLYlink written 15 months ago by newscient20
1

ok just checking to make sure :-) Because using a uniform dist. the coverage at any given site will be Poisson distributed.

I coded gargammel which is a simulator for ancient DNA:

https://grenaud.github.io/gargammel/

it can be used for modern DNA as well though, just remove the ancient DNA idiosyncrasies. It uses ART to simulate seq errors. I know that ART can simulate different coverage. I do not know if ART can add adapters if the fragment length is less than the read length. gargammel does this though. gargammel also allows you to specify desired coverage.

ADD REPLYlink modified 15 months ago • written 15 months ago by Gabriel R.2.6k
1

Do you want the probably distribution of read sampling at each position to be a uniform distribution, or do you want uniform coverage across your genome? If you want uniform coverage, then you can't use a random generator. If you want uniform coverage of 50x using 100bp reads then you'll have to generate 1 read every 2 bases; uniform 100x coverage of 100bp reads requires simulating 1 read at each genomic position; and so on.

ADD REPLYlink modified 15 months ago • written 15 months ago by d-cameron2.0k

You can also try randomreads.sh from BBMap suite. Check the in-line help for various options.

ADD REPLYlink written 15 months ago by genomax64k

Is the an non-random equivalent? OP appears to want perfectly uniform coverage and to get that you can't use a random sampling strategy.

ADD REPLYlink written 15 months ago by d-cameron2.0k

randomreads.sh is the name of the program. It has many options to generate simulated data.

ADD REPLYlink written 15 months ago by genomax64k
1
gravatar for Sej Modha
15 months ago by
Sej Modha4.1k
Glasgow, UK
Sej Modha4.1k wrote:

Reads Simulation and Read Simulator

ADD COMMENTlink written 15 months ago by Sej Modha4.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1970 users visited in the last hour