Question: Generating random DNA sequence and paired-end alignment
1
gravatar for ThePresident
4.3 years ago by
ThePresident150
ThePresident150 wrote:

Hello Biostars community,

I would like to simulate structural variants calling (i.e. genomic inversions, deletions and insertions) in order to understand some experimental results I am getting.

My idea is:

  1. Generate random DNA sequence of defined length (ex. 1 Mbp) with equal probability of A/T/C/G and store as fasta1
  2. Manually create genomic inversion/deletion/insertion/duplication etc. and store as fasta2
  3. Tricky part: Use the sequence from fasta2 and generate random paired-end data with fastq format (thus generating random but unique header, sequence of defined length derived from fasta2 with highest quality). These paired-end "reads" would also need to have a defined insert length (let's say 500bp with some standard deviation).

Since my knowledge in coding is basic-next-to-nothing, I am not sure if this is actually possible and have no idea if I should use R, Python or...? Any help or existing scripts would be highly appreciated.

Thank you in advance.

python simulation R • 1.6k views
ADD COMMENTlink modified 2.4 years ago by Johan Zicola60 • written 4.3 years ago by ThePresident150
2
gravatar for GenoMax
4.3 years ago by
GenoMax92k
United States
GenoMax92k wrote:
  1. you could grab a bacterial genome from GenBank.
  2. You are going to do this manually
  3. randomreads.sh from BBMap. Guide thread.
ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by GenoMax92k

I am already dealing with bacterial genomes. They frequently have inversions/duplications etc. so I want to generate a random sequence which will (hopefully) be free of such structures. Thanks for No.3 :)

ADD REPLYlink written 4.3 years ago by ThePresident150
1

If you need random sequence then use: Generate Random Dna Sequence Data With Equal Base Frequencies

Or two online sites:

http://users-birc.au.dk/biopv/php/fabox/random_sequence_generator.php
http://www.faculty.ucr.edu/~mmaduro/random.htm

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by GenoMax92k

Everything works well, I didn't think I would pull this easily. All tools are already there :) BTW, any chance there is an automated generator of inversions/duplications and such (point 2)? I am doing it manually, and it's a little bit time consuming.

ADD REPLYlink written 4.3 years ago by ThePresident150
1

BBMap has a recent addition called mutate.sh, that I made for testing the sensitivity of contaminant removal when the contaminants are bacterial strains of the same species. It creates a mutant variant of a genome. For example:

mutate.sh in=ecoli.fasta out=mutant.fasta id=0.95

This will create a mutant version of the original genome with 95% identity to the original. The mutations are random, with no conserved locations (though I may add that option later), so any duplications or inversions in the original will (probabilistically) not be present in the mutant, since they would have received different mutations. However, the general structure will still be similar to a real bacteria. If you want to generate synthetic reads from a bacteria-like thing with no repeats or inversions, I suggest you run mutate on a real bacterial genome, then use randomreads.sh on the mutant genome. 95% identity should be sufficiently low (averaging a mutation every 20bp), though it depends on your specific needs.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Brian Bushnell17k
1
gravatar for Johan Zicola
2.4 years ago by
Johan Zicola60
Johan Zicola60 wrote:

I wrote a python script with the different functions you would need to test structural variation calling on either randomly generated fastq files or fastq files generated based on a given specified fasta file. Find the script and documentation on https://github.com/johanzi/fastq_generator

ADD COMMENTlink written 2.4 years ago by Johan Zicola60
0
gravatar for Aerval
4.3 years ago by
Aerval280
Germany
Aerval280 wrote:

A review on various tools: http://www.nature.com/doifinder/10.1038/nrg.2016.57

ADD COMMENTlink written 4.3 years ago by Aerval280

superawesome! Thanks

ADD REPLYlink written 4.3 years ago by ThePresident150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1885 users visited in the last hour