I'm writing a piece of software to simulate Structural Variations in Genome. So far, I have written a first version with a simple set of features:
- accept a FASTA in input,
- write the new genome into a new FASTA file and the variations into a VCF file,
- simulate SNP,
- simulate indels,
- the frequency of SNP, Insertion and Deletions are configurable.
Before going ahead with new features, I would like to hear from your feedback and recommendations.
I thought that one of the possible improvements (to make it more "real") could be to be able to control the frequency of variants depending on the region of the chromosome: to generate for instance more variants in non-coding regions.
Thanks in adance for any input!