Hello Biostars community,
I would like to simulate structural variants calling (i.e. genomic inversions, deletions and insertions) in order to understand some experimental results I am getting.
My idea is:
- Generate random DNA sequence of defined length (ex. 1 Mbp) with equal probability of A/T/C/G and store as fasta1
- Manually create genomic inversion/deletion/insertion/duplication etc. and store as fasta2
- Tricky part: Use the sequence from fasta2 and generate random paired-end data with fastq format (thus generating random but unique header, sequence of defined length derived from fasta2 with highest quality). These paired-end "reads" would also need to have a defined insert length (let's say 500bp with some standard deviation).
Since my knowledge in coding is basic-next-to-nothing, I am not sure if this is actually possible and have no idea if I should use R, Python or...? Any help or existing scripts would be highly appreciated.
Thank you in advance.