Question: Can You Do Some Recommendations For An Indel/Snp/Cnv Simulator?
gravatar for Pascal
7.9 years ago by
Pascal1.5k wrote:


I'm writing a piece of software to simulate Structural Variations in Genome. So far, I have written a first version with a simple set of features:

  • accept a FASTA in input,
  • write the new genome into a new FASTA file and the variations into a VCF file,
  • simulate SNP,
  • simulate indels,
  • the frequency of SNP, Insertion and Deletions are configurable.

Before going ahead with new features, I would like to hear from your feedback and recommendations.

I thought that one of the possible improvements (to make it more "real") could be to be able to control the frequency of variants depending on the region of the chromosome: to generate for instance more variants in non-coding regions.

Thanks in adance for any input!

ADD COMMENTlink written 7.9 years ago by Pascal1.5k
gravatar for Pierre Lindenbaum
7.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

some trivial suggestions:

  • more SNPs in the introns, non-coding regions
  • more SNPs in the less conserved regions ( see the GERP score)
  • more synonymous snps in the exons.
ADD COMMENTlink written 7.9 years ago by Pierre Lindenbaum124k

Thanks for your answer. To know if my REF base is whether in intron or exon I plan to use BED files generated from the USCS Table tool. Do you think it is a good idea?

ADD REPLYlink written 7.9 years ago by Pascal1.5k

BTW Pierre, in order to generate synonymous SNPs I have to read groups of 3 nucleotides from genome (then see if a base change will be synonymous or not) AND to detect also frame-shift. Is there anything else I should take care of if I decide to implement simulation of sSNP?

ADD REPLYlink written 7.8 years ago by Pascal1.5k

you could use the human codon frequencies ?

ADD REPLYlink written 7.8 years ago by Pierre Lindenbaum124k
gravatar for lh3
7.9 years ago by
United States
lh331k wrote:

To the best of my knowledge, no one knows how to simulate CNV/SV/indel to an acceptable accuracy. If you really want to get a good simulation, you can simulate reads from a de novo assembly and then get the positions of simulated reads from an assembly-to-assembly alignment. This procedure is quite complicated, but is closest to the truth.

ADD COMMENTlink written 7.9 years ago by lh331k

Thanks for your answer. It is interesting but for the time being I just can't invest too much time :-S I'm looking for a trade-off between complexity of the algo and getting closer to the reality.

ADD REPLYlink written 7.9 years ago by Pascal1.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1867 users visited in the last hour